Augmented classical least squares multivariate spectral analysis

ABSTRACT

A method of multivariate spectral analysis, termed augmented classical least squares (ACLS), provides an improved CLS calibration model when unmodeled sources of spectral variation are contained in a calibration sample set. The ACLS methods use information derived from component or spectral residuals during the CLS calibration to provide an improved calibration-augmented CLS model. The ACLS methods are based on CLS so that they retain the qualitative benefits of CLS, yet they have the flexibility of PLS and other hybrid techniques in that they can define a prediction model even with unmodeled sources of spectral variation that are not explicitly included in the calibration model. The unmodeled sources of spectral variation may be unknown constituents, constituents with unknown concentrations, nonlinear responses, non-uniform and correlated errors, or other sources of spectral variation that are present in the calibration sample spectra. Also, since the various ACLS methods are based on CLS, they can incorporate the new prediction-augmented CLS (PACLS) method of updating the prediction model for new sources of spectral variation contained in the prediction sample set without having to return to the calibration process. The ACLS methods can also be applied to alternating least squares models. The ACLS methods can be applied to all types of multivariate data.

RELATED INVENTIONS

[0001] This application claims the benefit of Provisional ApplicationNo. 60/309,619, filed on Aug. 1, 2001, and Provisional Application No.60/311,755, filed on Aug. 9, 2001.

GOVERNMENT RIGHTS

[0002] The Government has rights to this invention pursuant to ContractNo. DE-AC04-94AL85000 awarded by the U.S. Department of Energy.

FIELD OF THE INVENTION

[0003] The present invention relates generally to the field ofmultivariate spectral analysis and, more particularly, to a method foraugmenting a classical least squares calibration model to provideimproved predictions of component values in unknown samples havingunmodeled sources of spectral variation.

BACKGROUND OF THE INVENTION

[0004] Over the past 20 years, quantitative multivariate spectralanalysis has primarily shifted from the explicit classical least squares(CLS) method to the implicit principal component regression (PCR) andpartial least squares (PLS) methods. The principle motivation for thisshift is that CLS is based on an explicit linear additive model, e.g.,the Beer-Lambert law. As such, CLS has the significant limitation thatit requires the concentrations of all spectrally active components beknown and included in the calibration model before an adequateprediction model can be developed. On the other hand, the PCR and PLSmethods can achieve excellent predictions for multivariate spectral datasets where all of the spectrally active components have not beendetermined. Consequently, CLS has been relegated to solving a small setof well-defined linear problems with known spectrally active componentsthat adhere to the Beer-Lambert law, e.g., infrared spectra of gas-phasesamples.

[0005] Nevertheless, PCR and PLS do not have the qualitativecapabilities of CLS since they do not generate explicit estimatedpure-component spectra that can be readily interpreted. Also, they arenot well suited to the advantages of a newly developedprediction-augmented CLS (PACLS) technique as set forth in U.S. Pat. No.6,415,233, which is incorporated by reference herein. The PACLSalgorithm provides a basis for rapidly updating a CLS model duringprediction of component values of the target unknown sample. PACLS addsspectral shapes (i.e., spectral intensity information) to the CLSestimate of the pure-component spectra during prediction to account forspectrally active components or other spectral effects present in theprediction samples that were not modeled during calibration. PACLSallows CLS models to be updated for the presence of spectrometer drift,changes in spectrometer parts or changes in whole spectrometers,unmodeled chemical or non-chemical spectral components, as well asupdating for more generalized changes such as changes in startingmaterials, the presence of nonlinearities, chromatic aberrations, orstray light, etc.

[0006] However, the PACLS algorithm is limited by the fact that accuratepredictions require all interfering spectral components (includingchemical and non-chemical sources of spectral variation) be explicitlyincluded during calibration or prediction. If one or more spectralinterferences were left out of the calibration, then their spectralinfluence would have to be explicitly added during prediction to correctfor their absence in the calibration model.

[0007] These limitations of the CLS model can be reduced and eveneliminated by the development of a new generalized family of algorithms,hereinafter referred to as augmented classical least squares (ACLS). TheACLS model uses information derived from component values and spectralresiduals during the CLS calibration to provide an improvedcalibration-augmented CLS model. When the new ACLS methods are combinedwith the PACLS prediction algorithm, a powerful set of new multivariatecapabilities is realized such that analyses can be performed withincomplete knowledge of interferences in the calibration and theprediction data.

[0008] The present invention further provides a generalization of ACLSmethods for analyzing multivariate spectral data. Specific embodimentsof the generalized ACLS methods are: spectral-residual augmentedclassical least squares (SRACLS), scores augmented classical leastsquares (SACLS), and concentration-residual augmented classical leastsquares (CRACLS) methods which all allow one to overcome the abovedeficiencies. The SRACLS, SACLS, and CRACLS methods are based on CLS sothat they retain the qualitative benefits of CLS, yet they have theflexibility of PLS and other hybrid techniques in that they can define aprediction model even with unmodeled sources of spectral variation thatare not explicitly included in the calibration model. The unmodeledsources of spectral variation may be unknown constituents, constituentswith unknown concentrations, nonlinear responses, non-uniform andcorrelated errors, or other sources of spectral variation (e.g.,temperature, spectrometer drift, etc.) that are present in thecalibration sample spectra.

[0009] Augmentation can also be applied to constrained alternatingclassical least squares methods (alternating between CLS calibration andCLS prediction) that are used when the reference variables, such as thepure-component spectra or component concentrations, are inadequatelyknown for standard CLS method. The ACLS methods of the present inventioncan improve the component identification with such inadequately knowndata sets.

[0010] Combining the present invention with the PACLS technique resultsin prediction models that are generally comparable or better inprediction ability to the standard PLS models. Also, since the variousACLS methods are based on CLS and unlike PLS, they can incorporate thePACLS feature of updating the prediction model for new sources ofspectral variation without the need for time-consuming recalibration.These updated prediction models only require spectral information whilePLS requires spectral and concentration information duringrecalibration. The present invention is not restricted to usingcontinuous spectral information, but can also use any set ofdiscontinuous spectral intensities that are selected in the calibrationfor the least squares analysis. Finally, the present invention generatesbetter qualitative information about the analytes by generating betterestimates of their pure-component spectra.

SUMMARY OF THE INVENTION

[0011] A method of multivariate spectral analysis is provided that isable to generate accurate and precise prediction models frommultivariate spectral data which includes unmodeled spectrally activecomponents present in the calibration model. The method provides theimproved qualitative information of CLS methods as well as thequantitative prediction ability of the implicit multivariate calibrationmethods by augmenting the calibration model with a measure of theresidual resulting from unmodeled components.

[0012] The present method of multivariate spectral analysis is mostuseful when an underlying calibration model includes unmodeled sourcesof spectral variation. In particular, reference variables (e.g., samplespectra and component values) are obtained for a set of calibrationsamples. An estimate is obtained for at least one reference variable forthe set of calibration samples. Thereafter, a residual is obtainedbetween the at least one reference variable and its correspondingestimated variable. A measure of the residual between the estimated andreference variable is used to augment its corresponding referencevariable. Using the augmented reference variable, a value of at leastone component in a set of unknown samples is predicted. Augmentation canbe repeated until all sources of spectral variation are accounted for inthe calibration samples. The iterative process allows the development ofa CLS-type calibration model comparable in its quantitative predictionability to implicit multivariate calibration methods, even whenunmodeled spectrally active components are present in the calibrationsample spectra.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013]FIG. 1 depicts pictorially Eq. (16).

[0014]FIG. 2 depicts pictorially Eq. (17).

[0015]FIG. 3A is a flow diagram of the SRACLS embodiment of the presentinvention. FIG. 3B is a flow diagram of the SACLS embodiment of thepresent invention.

[0016]FIG. 4 is a flow diagram of the CRACLS embodiment of the presentinvention.

[0017]FIG. 5 depicts four loading vectors (solid lines) estimated fromsimulated spectra using the CRACLS calibration that includes urea andwater concentrations and two concentration residual augmentations. Thepure-component spectra (dotted lines) for water, glucose, urea, andethanol are placed close to the corresponding loading vector (LV) forshape comparison.

[0018]FIG. 6 depicts mean-centered simulated spectra showing highfrequency noise.

[0019]FIG. 7 depicts shape comparisons of the first weight loadingvector (WLV) of a PLS glucose calibration and the first loading vectorof a CLS calibration including glucose and water using the simulateddata with the pure-component spectrum for glucose.

[0020]FIG. 8A depicts the mean spectrum and 8B depicts the mean-centeredspectra of the calibration spectra.

[0021]FIG. 9A depicts the difference between average repeat spectra forthe calibration and prediction data sets and 9B depicts themean-centered average repeat spectra from the prediction data set.

[0022]FIGS. 10A, 10B and 10C depict the ethanol, urea and waterconcentration residuals, respectively, vs. the reference glucoseconcentration from the CLS model applied to the simulated data whenleaving out glucose concentrations.

[0023]FIG. 11 is a flow diagram of the ACLS method of the presentinvention as applied to an alternating least squares model.

[0024]FIG. 12 shows a video image of the thin neoprene sample aged for 6days in air at 140° C.

[0025]FIG. 13A depicts spectra of neoprene pixels from spectral image ofthe neoprene sample. FIG. 13B depicts spectra of neoprene bands fromspectral image of the sample after removing a linear baseline from eachof the 6 neoprene related spectral bands.

[0026]FIG. 14 shows examples of systematic artifacts in the spectralimage scores FIG. 14A shows scores from factor 5 of an interferogram.FIG. 14B shows scores from factor 10 of the negative log of a samplesingle beam. FIG. 14C shows scores from factor 2 of two ratioedbackground single-beam spectra after conversion to absorbance. FIG. 14Dshows scores from factor 3 from same two ratioed background single-beamspectra after conversion to absorbance.

[0027]FIG. 15 depicts mean-centered repeat spectra from the neoprenepixels in the spectral image.

[0028]FIG. 16A shows the covariance error structure of the mean-centeredrepeat spectra. FIG. 16B shows the ideal error covariance structureassumed for standard least-squares analyses.

[0029]FIG. 17 depicts the experimental and theoretical estimates for theweighting function. The upper trace is the inverse of the square root ofthe variance component from the mean-centered repeat spectra (leftaxis). The lower trace is the average single-beam spectrum of theneoprene pixels (right axis).

[0030]FIG. 18 depicts the estimated pure-component spectra usingstandard non-negativity constrained MCR.

[0031]FIG. 19 depicts the concentration maps for components 1, 2, and 3obtained from the standard non-negativity constrained MCR.

[0032]FIG. 20 depicts the estimated pure-component spectra obtained byaugmenting with the first five scores from the mean-centered repeatspectra and using non-negativity and equality constrained MCR.

[0033]FIG. 21 depicts the concentration maps for components 1, 2, and 3obtained by augmenting with the first five eigenvectors from themean-centered repeat spectra and using non-negativity and equalityconstrained MCR.

[0034]FIG. 22 depicts the estimated pure-component spectra obtained byaugmenting with the first five sets of scores and eigenvectors from themean-centered repeat spectra and using non-negativity and equalityconstrained MCR.

[0035]FIG. 23 depicts the concentration maps for components 1, 2, and 3obtained by augmenting with the first five sets of scores andeigenvectors from the mean-centered repeat spectra and usingnon-negativity and equality constrained MCR.

[0036]FIG. 24 depicts the differences in the concentration maps ofcomponents 1, 2, and 3 from the standard MCR and MCR augmented with thefirst five sets of scores and eigenvectors from the mean-centered repeatimage spectra of the neoprene pixels.

DETAILED DESCRIPTION OF THE INVENTION

[0037] To better understand the present invention, the followingintroductory discussion is provided. The notation in the equations belowuses upper-case bold letters for matrices, lower-case bold letters forvectors, and italicized letters for scalars. We use the {circumflex over( )} to indicate estimated values, ^(T) to denote a transposed matrix,⁻¹ for matrix inversion, ⁺ for the pseudoinverse of a matrix, and {tildeover ()} for augmented matrices. The use of the pseudoinverse can resultin improved numerical precision from a variety of methods, e.g. singularvalue decomposition. Vectors are column vectors with row vectorsindicated by the transpose of the vector.

[0038] As used herein, the term multivariate spectral data includes alltypes of multivariate data (i.e., data related to each sample or objectis composed of responses from two or more variables). As used herein,spectral data to be used in the multivariate spectral analysis areconsidered to be any form of data where there are responses at two ormore measured variables (e.g., wavelength, time, retention time,frequency, etc.) due to one or more perturbation sources (e.g.,electromagnetic radiation, vibration, temperature, mechanical impulses,etc.). Therefore, the present invention can be applied to conventionalspectroscopic methods that include, but are not limited to, infrared,near-infrared, visible, ultraviolet, X-ray, gamma-ray, Raman, massspectroscopy, ion-mobility mass spectroscopy, Auger, fluorescence,phosphorescence, ESCA, far-infrared, microwave, x-ray fluorescence, NMR,energy loss spectroscopy, EDAX, ESR, and multi- and hyper-spectralimaging spectroscopy. These spectroscopic methods can be used inabsorption, transmission, reflection, or emission modes. In addition,the methods of the present invention can be applied to other forms ofmultivariate data that include, but are not limited to, seismic data,chromatographic data, thermal gravimetric analysis, image data, andresponses from multiple sensors. In particular, the methods of thepresent invention can be used to provide quantitative analysis of imagedata of the type provided with recently available commercialhyper-spectral imaging spectrometers. The methods can also be applied tothe analysis of multispectral and hyperspectral image data obtained fromremote sensors, such as satellite or airborne hyperspectral imagingsensors.

Classical Least Squares

[0039] The basic CLS method can be described by the followingrelationship:

A=CK+E _(A)  (1)

[0040] where A is a n×p matrix of measured spectra from n calibrationsamples at p frequencies, C is a n×m matrix of reference componentvalues for each of the m spectrally active components, K is a m×p matrixof in situ pure-component spectra scaled to unit concentration andpathlength for each component m, and E_(A) is a n×p matrix of spectralresiduals (i.e., model errors, if the model is not linear or if K doesnot include all the pure-component spectra, and spectral noise) for eachfrequency p. Although a component can typically be a chemical speciesand a component value can typically be a concentration of the chemicalspecies, the term component as used herein generally applies to anysource of spectral variation, be it chemical, physical or otherwise.Examples of sources of spectral variation that can be modeled includethe effect of the individual chemical species, chemical interactions, orany changes caused by a wide variety of physical parameters which caninduce spectral variations, such as temperature variations, spectrometerdrift, humidity changes, and sample insertions. Note that the physicalproperty of temperature can have a considerable quantifiable effect onthe sample spectra as can be seen from samples containing an aqueoussolvent. The CLS calibration step requires determining the least-squaresestimate for pure component spectra from a set of calibration sampleswith known component values. These estimated pure-component spectra aregiven by:

{circumflex over (K)}=(C ^(T) C)⁻¹ C ^(T) A=C ⁺ A  (2)

[0041] where C is the matrix of reference component values for thespectrally active components in the calibration sample set and A is thematrix of measured spectra for the calibration sample set.

[0042] In a CLS prediction step, the estimated pure-component spectracan then be used to predict the component values in an unknown set ofsamples according to:

Ĉ=A _(P) {circumflex over (K)} ^(T)({circumflex over (K)}{circumflexover (K)} ^(T))⁻¹ =A _(P)({circumflex over (K)} ^(T))⁺  (3)

[0043] where now A_(P) is the matrix of measured spectra for the unknownsamples.

[0044] Eq. (2) provides accurate estimates of the known pure-componentspectra {circumflex over (K)}, and therefore accurate predictions of theunknown component values Ĉ, only if the data fit a linear model and Ccontains all the components that contribute to spectral variation in thecalibration sample spectra A. The accuracy of the estimatedpure-component spectra {circumflex over (K)} improves with eachadditional known reference value c_(i) added to the CLS model.Therefore, the present invention is most useful in analyzingmultivariate spectral data when there are unmodeled sources of spectralvariation in the calibration model.

Augmented Classical Least Squares

[0045] The present invention provides a family of methods for analyzingmultivariate data where there are unmodeled sources of spectralvariation in the calibration model. The family of methods can be broadlydescribed as first obtaining reference variables for a set ofcalibration samples (e.g., reference spectral data and reference valuesfor components in the set of calibration samples). An estimate isobtained for at least one of the reference variables for the set ofcalibration samples. Thereafter, a residual is obtained between the atleast one reference variable and its corresponding estimated variable.The residual between the estimated and reference variables is used toaugment its corresponding reference variable. Using the augmentedreference variable, a value of at least one component in a set ofunknown samples is predicted. Augmentation can be repeated until allsources of spectral variation are accounted for in the set ofcalibration samples. The iterative process allows the development of aCLS-type calibration model comparable in quantitative prediction abilityto implicit multivariate calibration methods even when unmodeledspectrally active components are present in the calibration samplespectra. The present invention generates an adaptable model that canachieve excellent quantitative and qualitative prediction results forsamples of unknown composition that contain unmodeled sources ofspectral variation.

[0046] In the spectral-residual-augmented classical least squares(SRACLS) embodiment of the present invention, a spectral residual matrixE_(A) can be determined from a CLS estimate of the pure-componentspectra {circumflex over (K)}, reference values C, and calibrationspectra A:

E _(A) =A−C{circumflex over (K)}  (4)

[0047] The rows of the estimated pure-component spectra {circumflex over(K)} matrix can be augmented directly with all or selected vectors fromthe spectral residual matrix E_(A) to correct the CLS model forunmodeled, spectrally active components. The augmented matrix$\begin{matrix}\hat{\overset{\sim}{K}} & \quad\end{matrix}$

[0048] can then be used in an augmented CLS prediction to obtainpredicted component values $\hat{\overset{\sim}{C}}$

[0049] for a prediction set of unknown samples: $\begin{matrix}{\hat{\overset{\sim}{C}} = {{A_{P}{\hat{\overset{\sim}{K}}\left( {\hat{\overset{\sim}{K}}{\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {A_{P}{\hat{\overset{\sim}{K}}}^{+}}}} & (5)\end{matrix}$

[0050] where now A_(P) represents the spectral matrix of the unknownsamples to be predicted.

[0051] For an incompletely specified and/or nonlinear model, thespectral residual E_(A) includes a set of correlated, non-uniform errorsthat can be further decomposed into a sum of correlated errors, due tounmodeled spectral components, nonlinearities, or system-relatedcorrelated errors; and uncorrelated random errors representing systemnoise and/or spectral variation that are not relevant to prediction.

[0052] Factor analysis methods can be applied to the spectral residualE_(A) to separate the sources of these errors. Thus, the spectralresidual E_(A) can be described as:

E _(A) =TP+E  (6)

[0053] where T and P are the set of n×r scores and r×p loading vectors,respectively, obtained from the factor analysis of the spectralresiduals E_(A), and E is the set of n×p random errors and spectralvariations not useful for prediction. The dimension r is the rank of thespectral residual matrix E_(A) or the number of factors that arerequired for optimal prediction. Therefore, Eq. (1) can be re-written asfollows:

A=CK+TP+E  (7)

[0054] In one embodiment of the present invention, Eq. (6) can be factoranalyzed by any factor analysis method. Common methods include principalcomponent analysis (PCA), partial least squares (PLS), or principalcomponent regression (PCR). If PLS or PCR is used, then concentrationerrors (to be discussed later) will be included in the calibrationprocess, as described in U.S. Pat. No. 6,341,257, which is incorporatedherein by reference. The factor analysis can use either an orthogonalfactor analysis method or any non-orthogonal factor analysis method.

[0055] Using only the spectral residuals in the factor analysis, thescores, T, or the loading vectors, P, can be used to augment the CLScalibration model. If PCA is used as the factor analysis method appliedto the calibration spectral residuals E_(A), then the eigenvectors, P(i.e., the rows of P), can augment the rows of the {circumflex over (K)}matrix using the SRACLS model to improve the prediction ability of theCLS model. The augmented CLS prediction for component values$\hat{\overset{\sim}{C}}$

[0056] can then be obtained according to Eq. (5). Martens and Naes(Multivariate Calibration; John Wiley & Sons: Chichester 1989) suggestedthis approach for removing the effects of unmodeled components. However,Martens and Naes did not use the augmented pure-component spectra$\begin{matrix}\hat{\overset{\sim}{K}} & \quad\end{matrix}$

[0057] in combination with a PACLS algorithm for prediction. Whencombined with PACLS (wherein spectral shapes obtained from data orinformation that are independent of the calibration data set can be usedto update the spectral responses), the SRACLS model can be updated in aPACLS prediction step for the presence of unmodeled changes in theresponses of the prediction samples.

[0058] The scores T can also be used to correct the CLS calibrationmodel for unknown spectral components. In this case, the columns of Tcan augment the column dimension of the component values C matrix toform an augmented component value matrix {tilde over (C)}.Re-calibrating with the augmented CLS method as in Eq. (8) results in anaugmented {tilde over ({circumflex over (K)})} matrix that allows theaugmented CLS model to have better predictive properties than theoriginal CLS model: $\begin{matrix}{\hat{\overset{\sim}{K}} = {{\left( {{\overset{\sim}{C}}^{T}\overset{\sim}{C}} \right)^{- 1}{\overset{\sim}{C}}^{T}A} \approx {{\overset{\sim}{C}}^{+}A}}} & (8)\end{matrix}$

[0059] When augmentation makes use of scores, this embodiment isreferred to as scores-augmented classical least squares (SACLS). Indeed,one can also simply augment the component values C matrix with one ormore columns of random numbers. In general, any set of vectors of thecorrect dimension for T can be used to improve the CLS model as long asthey each contain some independent information relative to themselves orthe original C matrix, i.e., they cannot be collinear with each other orC.

[0060] With both SRACLS and SACLS, the spectral residuals E_(A) can beobtained from the full calibration model or from cross-validatedcalibrations. The optimal number of augmentations in either SRACLS ofSACLS can be selected based on an F-test of the concentration residuals(using cross-validated concentration residuals) according to methodsdescribed by Haaland and Thomas, Anal. Chem. 60, 1193 (1988) and Haalandand Thomas, Anal. Chem. 60, 1202 (1988), both of which are incorporatedherein by reference.

[0061] Another ACLS method involves the use of component value residualsE_(C) (e.g., concentration residuals) to augment the component matrix C.When augmentation makes use of concentration residuals, the method isreferred to as concentration-residual-augmented classical least squares(CRACLS). In general, each estimated pure-component spectrum willconsist of the pure-component spectrum at unit reference value for thecorresponding known reference component in C plus linear combinations ofthe unmodeled pure-component spectra, i.e. $\begin{matrix}{{\hat{k}}_{j} = {k_{j} + {\sum\limits_{i = 0}^{m^{u}}\quad {b_{i}k_{i}^{u}}}}} & (9)\end{matrix}$

[0062] where {circumflex over (k)}_(j) represents the estimatedpure-component spectrum for the j^(th) component and b_(i) are thelinear scale factors for the m^(u) unmodeled pure-component spectrak_(i) ^(u). In the CRACLS method, these estimated pure-component spectra{circumflex over (k)}_(j) can then be used to estimate the referencecomponent values Ĉ as follows:

Ĉ=A{circumflex over (K)} ^(T)({circumflex over (K)}{circumflex over (K)}^(T))⁻¹ =A({circumflex over (K)} ^(T))⁺  (10)

[0063] An n×m component residuals matrix, E_(C), then is given by:

E _(C) =Ĉ−C  (11)

[0064] The component residuals E_(C) can be obtained from the entire setof calibration data after the CLS steps in Eqs. (9) and (10).Alternatively, the component residuals can be obtained fromcross-validated predictions on calibration samples rotated out duringthe cross-validation procedure or from prediction on a set of validationsamples that are not part of the calibration set.

[0065] To account for baseline variations, {circumflex over (K)} can beaugmented with explicit baseline functions, i.e., vectors representingthe potential variations in baselines in the spectra. These baselinefunctions can include an offset, general polynomials, orthogonalLegendre polynomials, or any expected functional form of the baselines.A row is added to {circumflex over (K)} for each additional order in thebaseline function added. Using the PACLS technique, estimated ormeasured spectral shapes for other spectrally active components can beadded to {circumflex over (K)} as well. A corresponding column must beadded to Ĉ for each added row in {circumflex over (K)} before solvingEq. (10). If rows have been added to {circumflex over (K)}, then E_(C)is computed using only the estimated reference values corresponding tothe original reference values in C.

[0066] To consider unmodeled sources of spectral variation, Eq. (1) canbe rewritten:

A=CK+C _(u) K _(u) +E  (12)

[0067] where C_(u) is an n×m^(u) matrix that represents the unknowncomponent values for each of the m^(u) unmodeled pure-component spectrain K_(u), and E represents the error remaining after removing m^(u)unmodeled linear spectral effects. Even if E_(A) in Eq. (1) is theresult of nonlinear factors, the nonlinearities can be estimated by alinear approximation in C_(u)K_(u), so even in those cases, E can beconsiderably smaller then E_(A).

[0068] The matrix, K_(u), can be decomposed into the sum of two terms.One part of K_(u) can be written as the projection of K_(u) onto thespace spanned by K, (i.e., P(K)K_(u)=DK), where D is a m^(u)×m matrixand m^(u) is the number of rows in K_(u) and m is the number of rows inK. Another part of the decomposition of K_(u) is orthogonal to K,denoted by G. The decomposition of K_(u) is then:

K _(u) =DK+G  (13)

[0069] Substituting Eq. (13) into Eq. (12) and gathering terms yields:

A=(C+C _(u) D)K+C _(u) G+E  (14)

[0070] The estimated component values Ĉ in Eq. (10) are thenapproximately equal to (C+C_(u)D). Consequently from Eq. (11), we have

E _(C) ≅C _(u) D  (15)

[0071] Although D is not known, Eq. (15) still shows that each column,e_(c), of E_(C) approximates a linear combination of the unknowncomponent values unless K_(u) is orthogonal to K, (i.e., D=0, where 0 isthe null matrix). However, if D=0, the unmodeled components will notcontaminate the estimated component values Ĉ so they can be ignoredwithout affecting the predicted component values.

[0072] In practice, D=0 will generally occur only if the spectralcomponents do not overlap in the spectral region being analyzed. Forcalibration of the known components, the magnitudes for unit componentvalue of the pure-component spectra for the unmodeled components are notrequired, since only the relative component values are needed togenerate the correct net-analyte signals (NAS) for each of the knowncomponents. By including linear combinations of the unmodeled componentvalues as additional columns in C and solving Eq. (16), the resultingadditional pure-component spectra {tilde over ({circumflex over (K)})}will be linear combinations of the spectra of the unmodeledpure-components, K_(u). When enough linear combinations are added tocover all sources of spectral variation beyond the noise, the NAS foreach of the known components in {tilde over ({circumflex over (K)})}will be correct.

[0073] Selecting one vector of the component residuals, e_(c), andaugmenting the original C matrix with the selected residual vector e_(c)as a new column creates the augmented n×(m+1) matrix, {tilde over (C)}.If there are more than one unmodeled spectrally active components, onlyone of the component residual vectors e_(c), is used in theaugmentation, since the residuals for the other unmodeled componentswill contain redundant information. Criteria for selecting whichcomponent residual vector e_(c) to use will be discussed in more detailbelow.

[0074] Using {tilde over (C)}, the augmented, (m+1)×p, pure-componentspectra matrix $\begin{matrix}\hat{\overset{\sim}{K}} & \quad\end{matrix}$

[0075] can be computed according to: $\begin{matrix}{\hat{\overset{\sim}{K}} = {{\left( {{\overset{\sim}{C}}^{T}\overset{\sim}{C}} \right)^{- 1}{\overset{\sim}{C}}^{T}A} = {{\overset{\sim}{C}}^{+}A}}} & (16)\end{matrix}$

[0076] This step of the CRACLS method is illustrated in FIG. 1 where thecomponent residual vector e_(c) used to augment the reference componentvalue matrix C are given as the e_(i)'s, and the additional estimatedpure-component spectrum in Eq. (16), is represented by the dotted line.

[0077] By iteration, additional augmentations can be used to removeadditional unmodeled sources of spectral variation. The new augmentedestimated pure-component spectra $\begin{matrix}\hat{\overset{\sim}{K}} & \quad\end{matrix}$

[0078] from Eq. (16) can then be used to again estimate referencecomponent values $\hat{\overset{\sim}{C}}$

[0079] for the calibration samples according to: $\begin{matrix}{\hat{\overset{\sim}{C}} = {{A{\hat{\overset{\sim}{K^{T}}}\left( {\hat{\overset{\sim}{K}}{\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {A\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}}} & (17)\end{matrix}$

[0080] where here A are the calibration spectra. This estimation step isshown in FIG. 2 where the c_(ei)'s represent the estimated componentvalues for the additional pure-component spectrum (i.e., the dottedline). Since the additional spectrum is a linear combination ofunspecified magnitude of the unmodeled pure-component spectra, thec_(ei)'s in $\hat{\overset{\sim}{C}}$

[0081] may not provide any useful quantitative information. A new n×mcomponent residual matrix, E_(C)′, then is given by:

E _(C) ′=Ĉ′−C  (18)

[0082] where Ĉ′ is an updated estimated component value matrixconsisting only of the estimated component values in$\hat{\overset{\sim}{C}}$

[0083] corresponding to the known reference values in C.

[0084] The steps delineated by Eqs. (11, 16-17) remove one linearcombination of the unmodeled spectral variations from the calibrationmodel. These steps can be repeated again using one row from E_(C)′ tofurther augment the component value matrix. To mitigate all theadditional sources of spectral variation (i.e., reduce the residual fromE_(A) in Eq. 1 to E in Eq. 14), these steps can be repeated for each ofthe independent sources of spectral variation present in the calibrationdata. In other words, the component residual vectors e_(c), can span thespace defined by the unmodeled spectrally active components. In allaugmented CLS methods, the optimal number of augmentations can bedetermined using the same method for PLS factor selection that has beenpreviously published for PLS and PCR. See, e.g., Haaland and Thomas,Anal. Chem. 60, 1193 (1988) and Haaland and Thomas, Anal. Chem. 60, 1202(1988). In this factor or vector selection method, a maximum number ofaugmentations greater than the optimum number of augmentations isselected, and an ACLS model is determined for each level of augmentationup to this maximum number using cross-validation techniques. The optimalnumber of augmentations is then determined based on the F-test usingcross-validated component residuals (from the full calibration model orfrom cross-validated calibration models) for each level of augmentation,similar to the procedure described by Haaland and Thomas. The componentresiduals E_(C) in the CRACLS method can be generated from either thefull models or the cross-validated models.

[0085] Using the fully augmented {tilde over (C)}, augmentedpure-component spectra $\hat{\overset{\sim}{K}}$

[0086] can be calculated according to Eq. (16). This augmentedpure-component spectral matrix $\hat{\overset{\sim}{K}}$

[0087] can then be used to predict component values$\hat{\overset{\sim}{C}}$

[0088] in the unknown samples according to Eq. (5).

[0089] Because any of the ACLS models represent a CLS-type model, theyare well suited to take advantage of the newly developed PACLStechnique. The PACLS algorithm further augments the augmentedpure-component spectra $\hat{\overset{\sim}{K}}$

[0090] prior to prediction with spectral information representingunmodeled sources of spectral variation present in the unknown samplespectra to be predicted. Basically, $\hat{\overset{\sim}{K}}$

[0091] from the calibration model can be further augmented prior to theprediction step in Eq. (5) with more pure-component spectra thatencompass spectrally active components in the prediction data set thatwere not present in the calibration model. The advantage of PACLS isthat the calibration model can be updated quickly during prediction toaccount for new sources of spectral variation. Recomputing thecalibration model using the original and the additional spectra, asrequired by the prior art, would take considerably more time.

[0092] The spectra added during prediction should account for allsources of spectral variation introduced during the collection of theprediction data that were not accounted for in the calibration model, asdescribed in the previously referenced U.S. Pat. No. 6,415,233. If thesesources of spectral variation can be attributed to a known component, asample could be doped with the identified component to determine aspectral shape that can be added to the prediction. However, often thenew sources of spectral variation (such as caused by instrument drift orsample insertion effects) cannot be so easily identified. Frequently, arepeat sample can be used to provide the required spectral informationto augment the model for prediction. By repeatedly measuring a stablesample during calibration and prediction spectral measurements, spectralshapes for these new sources of spectral variation can be captured. Thedifferences between repeated spectral measurements obtained during bothcalibration and prediction or an eigenvector analysis of thosedifferences can provide the spectral information required foraugmentation of the prediction model.

[0093] Note that, unlike PLS, the spectral variation can be incorporatedusing the PACLS method without knowing the values of various components.When using the repeat sample spectra, the assumption is that the spectracan be used to provide a reasonable estimate of the error covariancematrix, i.e., that the unmodeled sources of spectral variation affectthe repeat sample spectra and the prediction sample spectra in the sameway. This assumption is valid if the spectra are similar across allsamples, which is the case for the dilute aqueous solutions consideredherein. However, if the range of component values causes largevariations in the sample spectra, it may be necessary to use a subset ofrepeat samples to adequately capture the impact of the unknown sourcesof spectral variation.

[0094] The following discussion of the SRACLS method of the presentinvention is made with reference to FIG. 3A. The present inventioncomprises two steps: calibration (steps 10-90) and prediction (steps100-130).

[0095] Initially, a set of calibration samples containing a plurality ofspectrally active components is obtained at 10. A set of calibrationspectra A are obtained for some or all of the samples in the calibrationsample set at 20. Additionally, a set of reference component values C isconstructed for at least one of the spectrally active components in thecalibration sample at 30. Such component values C typically provide ameasure of the concentration of one or more known spectrally activecomponent within the set of calibration samples. However, those skilledin the art will appreciate that such component values can representother factors representative of the effects of both chemical andphysical phenomenon that are spectrally active (i.e., can alter thespectral data such as temperature). With the calibration spectra A andcomponent values C, Eq. (2) can be used at 40 to obtain CLS calibrationmodel estimates of the pure-component spectra {circumflex over (K)}.

[0096] At step 45, baseline variations as well as estimated or measuredspectral shapes representing other unmodeled components can be added tothe estimated pure-component spectra {circumflex over (K)}.

[0097] At step 60, spectral residuals E_(A) representative of thedifferences between the calibration spectra A and the estimatedcalibration spectra can be obtained according to Eq. (4) using theestimated pure-component spectra {circumflex over (K)}. The spectralresidual matrix E_(A) can be decomposed by factor analysis methods toseparate the sources of the spectral residuals.

[0098] The spectral residuals can be decomposed per Eq. (6). All orselected spectral residual vectors from E_(A) or P can be used toaugment the rows of the estimated pure-component spectra {circumflexover (K)} at 90 (i.e. to obtain augmented pure-component spectra$\hat{\overset{\sim}{K}}$

[0099] at least once or until all unmodeled, spectrally activecomponents in the calibration sample set have been accounted for. Any ofa variety of methods to using spectral or component value residuals,variation or other properties can be used to select the vectors foraugmentation.

[0100] In FIG. 3B is shown the SACLS method of the present invention.Here, the scores T can be used to correct the CLS model for unknowncomponents. In this case, the columns of T can be added to C at 80 toaugment the column dimension of C to form an augmented component valuematrix {tilde over (C)}. Re-calibrating with this augmented CLS methodas in Eq. (8) results in an augmented {tilde over ({circumflex over(K)})} matrix at step 90 that allows the augmented CLS calibration modelto have better predictive properties than the original CLS model. All orselected score vectors can be added to C and the augmented CLS modelrecalibrated at least once or until all unmodeled, spectrally activecomponents have been accounted for in the calibration set in theaugmented pure-component matrix $\hat{\overset{\sim}{K}}.$

[0101] Any of a variety of methods can be used to select the scores foraugmentation.

[0102] In FIGS. 3A and 3B, the augmented CLS prediction for componentvalues $\hat{\overset{\sim}{C}}$

[0103] of the unknown samples then proceeds in the prediction steps100-130. The SRACLS or SACLS estimate of the pure-component spectra$\hat{\overset{\sim}{K}}$

[0104] can be further augmented with spectral data using the PACLStechnique (wherein spectral shapes obtained from data or informationthat are independent of the calibration sample set can be used to updatethe spectral responses) and/or with baseline variations at step 100.Thereafter, at step 110, a predicted component value matrix$\hat{\overset{\sim}{C}}$

[0105] for a prediction set of unknown samples 130 can be obtainedaccording to Eq. (5), where now A_(P) in step 120 is the matrix of themeasured spectra for the unknown samples.

[0106] The following discussion of the CRACLS embodiment of the presentinvention is made with reference to FIG. 4. This embodiment alsocomprises two steps: calibration (steps 10-90) and prediction (steps100-130).

[0107] Initially, a set of calibration samples containing a plurality ofspectrally active components is obtained at 10. A set of calibrationspectra A is obtained for some or all of the samples in the calibrationsample set at 20. Additionally, a set of reference component values C isconstructed for at least one of the spectrally active components in thecalibration samples at 30. With the calibration spectra A and componentvalues C, Eq. (2) can be used at 40 to obtain a CLS calibration modelthat comprises estimates of the pure-component spectra {circumflex over(K)}.

[0108] At step 45, baseline variations as well as estimated or measuredspectral shapes representing unmodeled components can be added to theestimated pure-component spectra {circumflex over (K)}.

[0109] By extension at 50, an estimate of the component values Ĉ can beobtained with Eq. (10). At step 70, a component residual matrix E_(c)representative of the differences between the estimated component valuesĈ and the reference component values C can be obtained according to Eq.(11). The component residual matrix E_(c) can be decomposed into acomponent residual vector e_(c), for each component included in thecalibration.

[0110] Each component residual vector e_(c), is representative of atleast one unmodeled spectrally active component. A selected componentresidual vector e_(c), can augment the original component values matrixC at 80 to form an augmented component values matrix {tilde over (C)}.Alternatively, a first estimate of the residual vector can simply bedrawn from random numbers. The augmented component value matrix {tildeover (C)} can be used to obtain an augmented estimate of thepure-component spectra $\hat{\overset{\sim}{K}}$

[0111] at step 90 according to Eq. (16). Using the singly augmentedpure-component spectra $\hat{\overset{\sim}{K}},$

[0112] the CRACLS method can transition to prediction.

[0113] Alternatively, in another embodiment with iterations indicated bythe circular arrow, steps 85-70-80-90 can be repeated to further augmentthe pure-component spectra $\hat{\overset{\sim}{K}}.$

[0114] With iteration, an augmented component value matrix$\hat{\overset{\sim}{C}}$

[0115] is calculated according to Eq. (17), from which an updatedestimated component value matrix Ĉ′ can be obtained at step 85. Thisupdated component value matrix Ĉ′ can then be used to calculate a newcomponent residual matrix E_(C)′ according to Eq. (18) at step 70. Aselected component residual vector e_(c)′ can then be used to furtheraugment the augmented component values matrix {tilde over (C)} at step80, from which a new augmented pure-component spectral matrix$\hat{\overset{\sim}{K}}$

[0116] can be obtained at step 90. This iteration can be repeated untilall unmodeled, spectrally active components in the calibration sampleset have been accounted for.

[0117] After the desired number of augmentations, the augmented CLSprediction for component values $\hat{\overset{\sim}{C}}$

[0118] of the unknown samples then proceeds in the prediction steps100-130. The estimate of the pure-component spectra$\hat{\overset{\sim}{K}}$

[0119] can be further augmented with spectral shapes representingspectrally active components in the prediction samples that where notmodeled in calibration steps using the PACLS technique and/or withbaseline variations at step 100. Thereafter, at step 110, a predictedcomponent value matrix $\hat{\overset{\sim}{C}}$

[0120] for a prediction set of unknown samples 130 can be obtainedaccording to Eq. (5), where now A_(P) in step 120 is the matrix of themeasured spectra for the unknown samples.

EXAMPLE 1 CRACLS Method Applied to the Analysis of Non-Imaging Data

[0121] The CRACLS method of the present invention is demonstrated withboth simulated and real data derived from a system of dilute aqueoussolutions containing glucose, ethanol, and urea. The simulated datademonstrate the effectiveness of the CRACLS algorithm and help toelucidate the principles behind the method. Using experimental data, theprediction abilities of CRACLS and PLS are compared duringcross-validated calibration. In combination with PACLS, the CRACLSpredictions are comparable to PLS for the prediction of the glucose,ethanol, and urea components for validation samples collected whensignificant instrument drift was also present.

[0122] The experimental samples consisted of a series of dilute aqueoussolutions of glucose, ethanol, and urea each independently varied overthe concentration range of 0-500 mg/dL. The samples were prepared byweight and volume in a pseudo D-optimal design that allowed each of thethree analytes to be varied separately at 9 levels over theconcentration range. The aqueous solvent was obtained from a singlesource of buffered saline solution. A detailed error analysis indicatedthat the samples were made to an accuracy of better than 1 mg/dL. Thecalibration data set consisted of 27 samples plus a repeat sample takenfrom the center of the design, i.e., the repeat sample containedapproximately 250 mg/dL each of glucose, ethanol, and urea. The set ofsamples used for prediction (i.e., the validation samples) included thesame repeat sample and 27 new samples from a design similar to thecalibration set. The validation samples spanned the same 0 to 500 mg/dLconcentration range for each of the analytes as in the calibration set.No sample from the new 27-sample prediction set had the same compositionas any sample in the calibration set. Five of the prediction sampleswere removed from consideration when they were determined to be outliersamples contaminated by the epoxy used to seal the lids on the cuvettes.

[0123] The samples were sealed with a magnetic stirring bar in 10-mmpathlength cuvettes. A temperature controller propelled the magneticstirrer while holding the samples at a temperature constant to 0.05° C.(±1 σ). The samples were placed in the temperature controller and heldin the beam of the spectrometer for 4 minutes with stirring to allow thesample temperature to equilibrate. The near-infrared spectra of thesamples were obtained on a Nicolet Model 750 Fourier transform infrared(FT-IR) spectrometer. The spectrometer employed a 75 W tungsten-halogenlamp, quartz beam splitter, and liquid-N₂-cooled InSb detector. Spectraat 16 cm⁻¹ resolution were obtained after averaging the interferogramsover a 2-minute period.

[0124] The run order of the calibration sample set was randomized. Thespectrum of the repeat sample held at 32° C. was obtained after eachgroup of two calibration or prediction samples. The spectra for theprediction samples were obtained approximately one month after thecalibration spectra were obtained. Purge variation was introduced duringthe prediction data set to induce additional short-term instrument driftand to accentuate the difficulty in maintaining the calibration. Sampletemperature was varied in random order in 2° C. steps over the range of30 to 34° C. for both the calibration and prediction samples. Abackground of the empty sample holder was obtained after each sample.Transmittance spectra of the calibration and validation sample sets wereobtained by ratioing each single-beam sample spectrum to either itscorresponding background or to the average of the background for theday. The spectra were then converted to absorbance. Since the bestcalibration and prediction results were obtained with the spectraobtained using the averaged daily background, only the results obtainedfrom these spectra are described herein.

Simulated Data

[0125] To demonstrate the effectiveness of the present invention, asimulated spectral data set was constructed using pure-component spectraderived from experimental dilute aqueous data using standard CLSmethods. While these simulated spectra do not precisely match the truepure-component spectra (primarily because of significant baselinevariations resulting from spectrometer drift during the collection ofthe calibration spectra), they can be used to demonstrate thecapabilities of the present invention. The simulated spectral data wereconstructed from these pure-component spectra and a series of assumedconcentrations to generate two different sets of 25 samples, one forcalibration and the other for validation, containing 0-500 mg/dL each ofglucose, ethanol, and urea and 98000-100000 mg/dL of water. The sum ofall the components fro each sample was constrained to 100000. Inaddition, normally distributed random spectral noise was added to theabsorbance spectra at a level of 0.3% of the maximum spectral absorbanceintensity. This level of noise amounted to approximately 5% of theabsorbance caused by the minor components, which is slightly more noisethan we observed in the measured spectral data.

[0126] The spectral data were analyzed in the spectral region from 7500to 11,000 cm⁻¹ using the CLS, PLS and CRACLS models. The calibrationmodels for each technique were used for prediction of the validationsample spectra.

[0127] The pure-component spectra K of the analytes and the watersolvent used to generate the simulated spectral data are shown in FIG.5. The glucose pure-component spectrum is similar to that of a modifiedwater spectrum since glucose has no noticeable vibrational features ofits own in this region of the spectrum. Rather, the CLS-estimatedglucose spectrum is dominated by the interaction of glucose with thewater solvent. Consequently, building a multivariate spectral model toestimate glucose concentration in the samples is more difficult than forthe other analytes. On the other hand, both ethanol and urea havedistinctive spectral features (e.g., at 8500 cm⁻¹ and 9900 cm⁻¹,respectively) so they are more easily quantified.

[0128] Since water is the dominant component, the spectra of thesimulated samples closely resemble that of water. However, if we removethe average spectrum, the spectral variation due to the analytes of eachsimulated sample is apparent as shown in FIG. 6. Most of the spectralvariation shown is due to differences in the concentrations of theanalytes between samples, which are accentuated by the baselinevariations present in these pure-component spectra. The high-frequencycomponent superimposed on the spectra is the added noise.

[0129] The effect of augmentation on the prediction ability of theCRACLS model was examined for the simulation samples. If only two of thefour components, glucose and ethanol, are used to build a CLS modelbased on the simulated data, the cross-validated standard errors ofprediction (CVSEP) for glucose and ethanol are 234 mg and 93 mg,respectively. That is, there is almost no prediction ability forglucose. However, if a CRACLS model is built, augmenting the calibrationmodel with the concentration residuals twice, CVSEPs of 17 mg forglucose and 3 mg for ethanol are achieved, demonstrating that thepresent invention method has clear prediction ability for bothcomponents. If only one component's concentrations is included in theCRACLS calibration model, either glucose or ethanol, and the model isaugmented three times, we get essentially the same CVSEPs are obtainedas above. Adding the concentration residuals E_(c) to the referenceconcentration matrix C successfully removed contamination of theunmodeled components from the calibration model for the includedcomponents.

[0130] The CRACLS loading vectors qualitatively resemble thepure-component spectra. The estimated pure-component spectra or loadingvectors (LV) that resulted by including water and urea concentrations inthe calibration and augmenting with concentration residuals twice areshown as the solid lines in FIG. 5. The vectors have been shiftedslightly to show the comparisons more clearly with the pure-componentspectra (the dotted lines). Also, the loading vector for ethanol wasmultiplied by −1 to match the orientation of the pure-componentspectrum. The qualitative information of the loading vectors is apparentwhen compared with the actual pure-component shapes shown using thedotted lines. LV1 and LV2 closely resemble the pure-component spectra ofthe given components, urea and water, respectively. Also, even thoughethanol was not included in the calibration, LV3 resembles itspure-component spectrum. Because the glucose has weak vibrational bandsin this region, its spectral information is not apparent in any of theloading vectors, even in LV4.

[0131] If the major component is left out of the CLS calibration, thequalitative information will not be as apparent. The loading vectorswill generally match the shape of the major component due to itsoverwhelming influence. The extent of this influence on the simulateddata is shown by the decomposition of the vectors in the next section.With mean centering, the loading vectors obtained from a CLS calibrationwithout the solvent concentrations will reveal more of the distinctivespectral features of the non-solvent components. However, for a mixturesystem where the sum of the concentrations is constrained to a constant,the CLS estimated pure-component analyte spectrum represents the netchange in the sample spectrum due to a unit change in the analyteconcentration. Consequently, the estimated pure-component spectrum willinclude negative spectral changes due to the displacement of the solventthat will contaminate the mean-centered loading vectors.

[0132] In every case, the qualitative information in the loading vectorsfor CLS will be equal to or better than PLS, since the first PLSweight-loading vector that retains the best qualitative information isgenerated from only one component. For CLS, and therefore for CRACLS,the qualitative information is improved as additional analyteconcentrations are included in the calibration model. To illustratethis, a PLS calibration on glucose and a CLS calibration using onlyglucose and water were computed using the simulated data. The weightloading vector (WLV) from the PLS calibration and the loading vector(LV) from the CLS calibration for glucose are given in FIG. 7 as thedashed and dotted lines, respectively. The PLS weight loading vector iscontaminated by the displacement of the water by the analyte, so thespectral shape does not match the pure-component shape (solid line) verywell. The loading vector from a CLS calibration using only glucose andmean centering has the same shape. However, by including the waterconcentration, its contamination is removed, and the CLS first loadingvector as shown in FIG. 7 provides an improved match to thepure-component spectrum. The more information included in the CLScalibration, the better the fit. If the spectrometer drift has a linearcomponent, adding time of spectrum collection to the CLS concentrationmatrix will yield better pure-component spectra. Also CLS will generatebetter pure-component spectra than PLS when the known componentconcentrations are correlated.

[0133] The CRACLS loading vectors may be decomposed to demonstrate theeffective removal of unmodeled sources of spectral variation, even ifonly the minor components are included in the calibration. Thedecomposition of the first four CRACLS loading vectors provides insightinto this effectiveness when only the minor components of glucose andethanol are included in the calibration without mean centering. Bydefining a CLS prediction model using all four true pure-componentspectra with unit concentration, shown in FIG. 5, that were used tocreate the simulated data, the contribution of each of the true spectrato the loading vectors can be quantified as shown in Table I. TABLE IQuantity of pure-component spectra in CRACLS loading vectors usingglucose and ethanol concentrations and two augmentations. GlucoseEthanol Urea Water mg/dL mg/dL mg/dL mg/dL LV 1 1.0 0.0 0.4 196 LV 2 0.01.0 0.5 134 LV 3 0.0 0.0 −0.1 −150 LV 4 0.0 0.0 −1.0 −48

[0134] The spectral residuals, after removing the given amount for eachof the pure-component spectra, represent only random noise, indicatingthat all the spectral information contained in these loading vectors wasremoved. As shown in the Table I, the first and second loading vectorsare the only ones that contain the glucose and ethanol pure-componentspectra, respectively, corresponding to the order that theirconcentrations were placed in the concentration matrix. Note also thatthe pure-component spectra are present at unit concentration so they canprovide quantitative information during prediction. In addition, theurea and water spectra contaminate these first two estimatedpure-component spectra, as expected from Eq. (9).

[0135] In Table I, the amount of water far exceeds the other components,so all of the loading vector shapes are very similar to the pure waterspectrum. The last two estimated pure-component spectra are linearcombinations of only urea and water, the components left out of theconcentration matrix. Consequently, during prediction using these fourloading vectors, the resulting NAS for glucose and ethanol (i.e., thepart of each pure-component spectrum orthogonal to all other vectors) isthe same as obtained from the fully specified CLS model. Basically, thecontamination of both water and urea is removed from the prediction ofthe glucose and ethanol by the third and fourth loading vectors.Therefore, the prediction model correctly determines the glucose andethanol concentrations even though the concentrations for the other twocomponents were not explicitly included in the calibration.

[0136] Results from the Experimental Data

[0137] Now we consider the experimental data. The mean calibrationspectrum and the mean-centered calibration sample spectra are shown inFIGS. 8A and 8B, respectively. As with the simulated data, the waterspectral features dominate the spectra before mean centering. However,most of the spectral variations in the mean-centered spectra shown inFIG. 8B are not due to concentration differences, but rather are theresult of spectrometer drift and temperature variations. Due to theseadditional sources of variation, even a fully specified CLS model is notable to adequately model the data. This fact is clearly demonstrated inthe CVSEPs from the full CLS model of the real calibration data shown inthe first column of Table II. TABLE II Minor component CVSEPs for diluteaqueous solutions using CLS, PLS, and CRACLS calibration models. CRACLSCRACLS CRACLS CLS PLS^(a) (10)^(b) (15)^(b) (20)^(b) Glucose (mg/dl) 7115 (11) 18 16 15 Ethanol (mg/dl) 23  6 (11) 6 5 5 Urea (mg/dl) 18 6 (9)6 5 5

[0138] All the chemical components and temperature were included in themodel, and yet CLS still was unable to generate a precise calibrationmodel. These results indicate why CLS is seldom used for prediction inmultivariate calibrations. Also provided in Table II are the CVSEPs forPLS, with the optimum number of factors shown in parenthesis, and forCRACLS, using 10, 15 and 20 augmentations. Although there was a gradualdecrease in the CVSEPs with more augmentations, the optimal number ofaugmentations was not apparent. Differences between the presentinvention calibration results and the PLS results for these data are notstatistically significant.

[0139] To predict the validation data, information from the repeatsample was included in both the PLS and the CRACLS models. For PLS, allthe spectra from the repeat sample taken with the prediction data setwere added to the original calibration data set along with its componentconcentrations, and the PLS model was recalibrated. The cross-validatedcalibrations were performed initially by using the standard approach ofremoving all the data for each sample during the rotations (sample-outrotation), so that all the spectra for the repeat sample were removed atonce. The number of loading vectors or factors for the PLS model wasdetermined using an F-test on the cross-validated calibrations.

[0140] For the present invention, no changes were made in the CRACLScalibration model, i.e., no information from the repeat sample was givento the original CRACLS model to compute new pure-component spectra.Instead, the sources of spectral variation from the repeat sample shownin FIGS. 9A and 9B were used (without concentration information) toaugment the pure-component matrix during prediction. The advantage ofthis PACLS augmentation is that the prediction model can be rapidlyupdated to account for new sources of spectral variation in theprediction data set. Recomputing the calibration model using theoriginal and the additional spectra from the repeat sample would takeconsiderably more computation time. The spectral difference between theaverage of the repeat spectra from the calibration and prediction datasets, shown in FIG. 9A, was added to the prediction model to capture thelong-term spectrometer drift between the calibration and predictiondays. Without the addition of this mean-difference spectrum, thepredicted values show a definite bias. In addition, all themean-centered repeat spectra for the prediction repeat sample, shown inFIG. 9B, were added to eliminate the detrimental effects of anyshort-term instrument drift during the collection of the predictionsample spectra.

[0141] By mean centering, the spectral contribution of the repeatsample's specific concentration is removed, leaving only the variationfrom non-chemical sources. Note the sharp features in the mean-centeredspectra at approximately 8800 cm⁻¹ and 10600 cm⁻¹ in FIG. 9B. Thesefeatures are the result of water vapor variations caused by the changesin the quality of the purge. Other less obvious spectrometer driftfeatures contribute to the spectral variation as well. The backgroundspectra are inadequate to correct for all sources of spectrometer drift.In fact, prediction results were best when an average background, ratherthan individual backgrounds, was used to obtain the absorbance spectra.

[0142] Table III shows the standard error of predictions (SEPs) for allthe minor constituents in the prediction data set from various CLS, PLSand the present invention methods using calibration models based uponthe calibration data set and the repeat sample information. TABLE IIICVSEP for PLS and SEPs for CLS, PLS, and CRACLS on the prediction dataset using models containing the repeat sample spectral information. Com-PLS- PLS- CRACLS/ CRACLS/ ponent PLS CLS A^(a) B^(a) PACLS(10)^(b)PACLS(20)^(b) Glucose 18 287 55 (6)  21(12) 19 19 (mg/dl) Ethanol 5 27 9(8)  4 (13) 5 4 (mg/dl) Urea 5 42  4 (11)  4 (12) 4 4 (mg/dl)

[0143] As a comparison standard, the first column of Table III providesthe CVSEP obtained from a separate PLS calibration obtained from theprediction data. As expected, CLS without the information about theinterferents had poor prediction ability. PLS-A in Table III representsPLS recalibration with the optimal number of PLS factors selected duringthe sample-out cross-validated rotation. The results show that the modelwas inadequate, even though this method of rotating spectra out duringcross-validation has generally been recommended to avoid overfitting thedata.

[0144] Therefore, the cross-validated rotation was modified duringrecalibration so that each of the repeat spectra from the prediction setwas rotated out one at a time (spectrum-out rotation). By using thespectrum-out cross-validated rotation, a more correct number of factorsare selected, and the PLS-B model in Table III predicted well for thesedata. Cross-validated rotation removing individual spectra was preferredin this case over rotation removing all repeat spectra for a givensample since the repeat spectra were added to capture the system drift,not the component concentrations. By rotating all repeat spectra out atonce, the F-test on the cross-validated calibrations did not adequatelygauge the impact of the drift on the residuals for factor selection.While the spectra-out rotation worked better for this data set, it canlead to severe overfitting in other data sets. Adding several spectrafor the same sample can over emphasize that sample in the model, whichcould lead to overfitting. Consequently, a careful analysis on acase-by-case basis is required to determine the appropriate type ofspectral rotation during cross-validation for PLS. The proper choice ofrotation for PLS will depend on the relative effects of theconcentrations and the spectrometer drift or other sources of spectralvariations that need to be modeled.

[0145] The results for present invention models using 10 and 20augmentations during the CRACLS calibration are also presented in TableIII. Again, it can be seen that the present invention compares favorablywith PLS. Since the CRACLS model was developed using all components andthe PLS model uses only one component at a time, it could be argued thatthe comparison is not valid. Therefore, the results were recalibratedusing the present invention with 20 augmentations and the concentrationsfrom only one component at a time to generate three separate models.Applying these models to the prediction data set, the SEPs for glucose,ethanol and urea were 20, 4, and 4 mg/dL, respectively. Essentially,these results matched the values derived when all the components wereused in the CRACLS calibration model. Clearly, either way, the presentinvention method effectively removed the impact of all sources ofspectral variation not represented by the given/known concentrations.Notice, however, that with CRACLS there is no problem with selecting arotation method since the mean-centered repeat spectra are simplyaugmented to the estimated pure-component spectra during prediction.

[0146] When there is more than one known component, the concentrationresidual vector to use for augmentation needs to be chosen. To do this,the impact that the unmodeled components will have on each of theconcentration residuals needs to be considered. From Eq. (15), theconcentration residual vector is related to the projection of theunmodeled pure-component spectra, K_(u), onto the space spanned by themodeled pure-component spectra, K, i.e. the portion of the unmodeledpure spectra that overlaps with the modeled pure spectra. If one of theunmodeled and one of the modeled pure-component spectra happen to beorthogonal, the concentration residual from that modeled component willnot provide any information about the concentrations of the unmodeledcomponent. However, in practice, orthogonal pure-component spectrararely occur since orthogonality generally implies that there is nospectral overlap. Even using simulated data with pure-component spectraconfigured to be nearly orthogonal, augmentation with the concentrationresidual vector was still sufficient to produce a good predictive model.

[0147] Returning now to the simulation, to demonstrate the applicabilityof the various component residuals, a calibration model was developedusing the simulated data, leaving out only the glucose concentrationsduring calibration. The concentration residuals vs. the referenceconcentration for the three components included in the calibration areshown in FIGS. 10A, 10B and 10C. The absolute value of the residualsvaries from 20 to 100 mg/dL depending on the analyte. However, thepattern of the sample residuals for each of the components is basicallythe same, indicating that the contamination from glucose contributedapproximately the same relative error to each of the samples for each ofthe components. Consequently, any of these component residuals will givestatistically equivalent CRACLS models since the magnitude of thedifferences will only change the scaling factor of the estimated glucosepure-component spectrum corresponding to the augmented residual vector,C_(u), in Eq. (15). The resulting NAS of the known components will beessentially identical regardless of which residual vector is selected.

[0148] Another choice for CRACLS is selection of the number ofaugmentations to use in the model. When using PLS, it is important toavoid using too many factors since overfitting the concentration datawill degrade the prediction model. The excessive PLS factors haveconcentration residuals associated with them that can degrade theconcentration predictions if they are included in the model. Finding thecorrect number of augmentations in CRACLS, however, may be less criticalthan choosing the optimal number of factors for PLS. Each additionalloading vector generated by the concentration residual augmentation inCRACLS results in a reduction of the NASs for the analytes of interest.After the major unmodeled sources of spectral variation have beenremoved, the addition of more concentration residual vectors willgenerate estimated pure-component spectra that represent primarilyrandom noise. Since random spectral noise is nearly orthogonal to thepure-component spectra of the known analytes, the impact on the NAS willbe minimal. The insensitivity of CRACLS to the number of augmentationswas demonstrated in the example above using real data. As shown inTables II and III, the models generated using 10 or 20 augmentationsgave equivalent results with no evidence of overfitting. A sufficientnumber of residual vectors need to be added to capture all of theunmodeled sources of spectral variation, but adding more residualvectors did not degrade the results. For these data, the CVSEPs forCRACLS dramatically decreased with increasing augmentations as realsources of spectral variation were modeled, and then the CVSEPs justgradually continued to decrease. The apparent insensitivity of CRACLS tothe number of augmentations is an advantage over PLS since, at times, itis difficult to determine the optimal number of factors for PLS. TheF-test for PLS can also be used on the cross-validated calibrations toselect the number of augmentations in CRACLS. This F-test often resultsin selecting the maximum or near the maximum number of augmentationscomputed since the CVSEPs often just gradually decrease with eachaugmentation.

[0149] An important observation is that the PACLS-augmented predictionmodels are mathematically identical whether the estimated of measuredspectral shapes representing additional unmodeled sources of spectralvariation in the prediction data set are added to {circumflex over (K)}of Eq. (10) during the cross validation in the calibration phase or to$\hat{\overset{\sim}{K}}$

[0150] before performing prediction. In addition, equivalent predictionability was achieved with CRACLS by using all the componentssimultaneously or by using the concentrations one component at a time.However, to enhance the sensitivity to outlier detection and to achievebetter qualitative information in the estimated pure-component spectra,it is recommended to include all known reference concentrations in thecross-validated calibration.

EXAMPLE 2 ACLS Method Applied to the Analysis of Hyperspectral ImageData

[0151] Multivariate methods use a large number of spectral channels totake advantage of the signal averaging and resolving power inherent inspectral data. The multivariate nature of the spectra and the massiveamount of data in hyperspectral images allow the characterization ofsystems with low signal-to-noise ratios and with multiple componentswhose spectral features overlap. Multivariate data also provide theopportunity to correct systematic spectral artifacts and aberrationsintroduced by the imaging instrument.

[0152] The quantitative analysis of hyperspectral image data collectedfrom FT-IR imaging systems that employ focal plane array (FPA) detectorscan be greatly hampered by the presence of system artifacts in thespectral and image data. Methods are required to eliminate thedetrimental effects of these systematic artifacts if they cannot beeliminated experimentally. The characteristics of the systematicartifacts can be captured in the error covariance structure of the dataestimated from repeat spectral image differences. By coupling thegeneralized ACLS method of the present invention with improvedmultivariate curve resolution (MCR) techniques and estimates of theerror covariance structure of the data, the detrimental effects of thesystematic artifacts present in hyperspectral imaging FT-IR systems canbe greatly minimized.

[0153] MCR is a constrained alternating classical least squares method(alternating between CLS calibration and CLS prediction) used when thereference variables, such as pure-component spectra or componentconcentrations, are inadequately known for the standard CLS method. SeeR. Tauler et al., “Multivariate Curve Resolution Applied to SpectralData from Multiple Runs of an Industrial Process,” Anal. Chem. 65(15),2040 (1993) and Bro and Dejong, “A fast non-negativity-constrained leastsquares algorithm,” J. of Chemometrics 11(5), 393 (1997), both of whichare incorporated herein by reference. Rather, MCR methods useconstraints, such as non-negativity, equality, closure, monotonicconstraint, unimodality, or selectivity, to achieve a solution.Therefore, MCR seeks to identify components of a data set by usingalternating classical least squares to iteratively obtain solutions foreach set of variables that putatively constitute the data. Because ofthe adaptability of the present invention and the fact that MCR employsa series of alternating classical least squares steps, ACLS methods canbe readily added to the MCR algorithms to improve the componentidentification with such data sets.

[0154] Alternating Classical Least Squares

[0155] Conventional MCR methods start with a guess for either the K or Cmatrices, using random numbers or using other methods to arrive at areasonable first guess, such as SIMPLISMA. See W. Windig and D. A.Stephenson, Anal. Chem. 64, 2735 (1992). If the starting guess is K,then CLS prediction is performed, with the appropriate constraint(s)applied (non-negativity, equality, etc.) to narrow the range ofpotential solutions, to provide a predicted component values matrix Ĉ,according to:

Ĉ=AK ^(T)(KK ^(T))⁻¹ =A(K ^(T))⁺  (19)

[0156] where A is the matrix of measured spectra for the unknown sample.Alternating to CLS calibration, estimated pure-component spectra{circumflex over (K)} can then be obtained, again using constraints,according to:

{circumflex over (K)}=(Ĉ ^(T) Ĉ)⁻¹ Ĉ ^(T) A=Ĉ ⁺ A  (20)

[0157] The steps depicted by Eqs. (19) and (20) can be repeated,updating the pure-component spectra K in Eq. (19) for each iterationwith new estimated pure-component spectra {circumflex over (K)} from Eq.(20), and testing convergence according to:

∥A−Ĉ{circumflex over (K)}∥ ²  (21)

[0158] until the set of iterations converges, minimizing the constrainedclassical least squares solution (i.e., a measure of the spectralresidual E_(A) is minimized).

[0159] Augmented Alternating Classical Least Squares

[0160] For any alternating least-squares MCR method, the calibration andprediction CLS steps can be augmented with information that it isdesired for the least-squares fit to ignore (e.g., vectorsrepresentative of known or estimated spectral components or any estimateof the error covariance of the system). An estimate of the errorcovariance can be obtained, for example, by taking repeat spectra or, inthe case of spectral imaging, repeat spectral images. Each least-squarescalibration step in MCR iteration can be augmented by the scoresobtained from the factor analysis of the error covariance matrix or eachleast-squares prediction step can be augmented by the loading vectorsfrom the same matrix, or both steps can be augmented. Additionalconstraints can be applied to the non-augmented portion only of theaugmented matrices. In this manner, quantitative analysis of spectralimages can proceed without the need for standards even in the presenceof large error covariance in the data. In general, ACLS methods will beuseful in the analysis of any type of spectral data where estimates ofthe error covariance of the data can be obtained.

[0161] In FIG. 11 is shown an augmented alternating CLS embodiment ofthe present invention. A starting guess is made of the pure-componentspectra K at step 200. This starting guess can also be a matrix ofrandom numbers. In addition, an estimate of the error covariance E_(A)of the measured spectra A is obtained. With spectral image data, thisestimate of the error covariance E_(A) can be obtained from thedifference of repeat spectral images or from shift differences obtainedfrom a single spectral image when repeat spectral images are notavailable. The estimate of the error covariance E_(A) can then be factoranalyzed into the product of T scores and P loading vectors. A preferredfactor analysis method is PCA. The guessed pure-component spectra K canbe augmented with one or more of the vectors of the P loading vectors atstep 260 to obtain augmented pure-component spectra {tilde over (K)}.The guessed pure-component spectra K can also be augmented with one ormore vectors representing spectral shapes that are representative of atleast one additional source of spectral variation according to the PACLSmethod. A first set of augmented component values$\hat{\overset{\sim}{C}}$

[0162] can then be predicted at step 110 according to: $\begin{matrix}{\hat{\overset{\sim}{C}} = {{A\quad {{\overset{\sim}{K}}^{T}\left( {\overset{\sim}{K}\quad {\overset{\sim}{K}}^{T}} \right)}^{- 1}} = {A\left( {\overset{\sim}{K}}^{T} \right)}^{+}}} & (22)\end{matrix}$

[0163] where A is the matrix of measured spectra for the unknown samplefrom step 210. Additional constraints can be applied to thenon-augmented portion of $\hat{\overset{\sim}{C}}$

[0164] at step 220. The first set of augmented component values$\hat{\overset{\sim}{C}}$

[0165] can be augmented with selected T scores at step 230 by replacingthe augmented values in $\hat{\overset{\sim}{C}}$

[0166] with the appropriate scores from T to further improve thesolution. An estimated augmented pure-component spectral matrix {tildeover ({circumflex over (K)})} can be obtained at step 90 according to:$\begin{matrix}{\hat{\overset{\sim}{K}} = {{\left( {{\hat{\overset{\sim}{C}}}^{T}\hat{\overset{\sim}{C}}} \right)^{- 1}{\hat{\overset{\ldots}{C}}}^{T}A} = {{\hat{\overset{\sim}{C}}}^{+}A}}} & (23)\end{matrix}$

[0167] Constraints can be applied to the non-augmented portion of$\hat{\overset{\sim}{K}}$

[0168] at step 240 to further narrow the range of possible solutions.The first set of augmented component values $\hat{\overset{\sim}{C}}$

[0169] from step 110, or the replaced set of augmented component values$\hat{\overset{\sim}{C}}$

[0170] from step 230, and the estimated augmented pure-component spectra$\hat{\overset{\sim}{K}}$

[0171] from step 90 can then be used to test for convergence at step 250according to:${{A - {\hat{\overset{\sim}{C}}\quad \hat{\overset{\sim}{K}}}}}^{2}$

[0172] These CLS prediction and calibration steps with constraints andaugmentation can be repeated in a series of iterations indicated by thecircular arrow until the set of iterations converges, minimizing theconstrained augmented classical least squares solution. Alternatively,the initial guess can be C with the CLS calibration being performedfirst. The procedure is then similar to the above-described modified MCRmethod with only the starting point changing. These modified MCR methodscan be applied to any set of multivariate data, and the use of augmentedCLS methods is demonstrated below with spectroscopic image data.

[0173] To demonstrate the improvements possible with augmented CLSmethods, the MCR analysis with augmentation were applied to a FT-IRspectral image obtained from a sample of neoprene thermally aged in airat elevated temperatures containing system artifacts in the image data.With the ACLS method applied to the spectral image data, augmentationwith scores improved the pure-component spectral estimates whileaugmentation with the corresponding eigenvectors improved primarily therelative concentration maps predicted from the image data. Augmentationwith both scores and loading vectors (eigenvectors) improved both thepure-component spectra and the concentration maps, even without therequirement for calibration standards.

[0174] Experimental

[0175] A sample of neoprene rubber was investigated. To obtain adequatesignal to noise in the FT-IR image data, the neoprene sample did notcontain the normal carbon filler found in many neoprene rubbers. Thesample that was used to illustrate the methods of the present inventionwas obtained from a 2 mm×4 mm×4 mm block of neoprene that was aged for 6days in an air furnace held at a temperature of 140° C. The sample wasthen potted in epoxy and the center of the sample was cut with amicrotome to an approximate thickness of 15 μm. However, independentestimates of the sample thickness as a function of position on thesample indicated that the thickness varied by a factor of greater thanthree across the sample.

[0176] The spectrometer used in this study was a BioRad Stingraystep-scan FT-IR imaging spectrometer equipped with amercury-cadmium-telluride (MCT) FPA detector that had 4096 (64×64)detector elements. The sample was placed in the macro samplingcompartment that provided a 4 mm×4 mm transmission image to be obtainedof the thin 2 mm×4 mm aged neoprene sample. The sample was pressed flatonto a KBr window for support. Background single-beam images wereobtained from a clear region of the KBr window next to the neoprenesample. Sample single-beam transmission image spectra were then obtainedof the neoprene sample. The spectra were collected at a nominal spectralresolution of 16 cm⁻¹ using a step rate of 2.5 steps/sec and co-adding64 images at each step of the interferometer. Data collection time wasapproximately 5 minutes for each spectral single-beam image. Three imagespectra were obtained in rapid succession of the KBr window backgroundbefore moving the sample into the beam. Three image spectra were thenobtained from the neoprene sample without moving the sample. Two ofthese repeat image spectra were used to obtain an estimate of the errorcovariance structure of the data.

[0177] Analysis

[0178] The spectral data were collected with the use of the Bio-RadWin-IR Pro software that came with the Stingray spectrometer. However,spectral analyses were performed using the Matlab code for the MCR,ACLS, and SIMPLISMA algorithms.

[0179] Small numbers of spectra from each image were clear outliers andwere removed from the analysis. These outliers generally exhibitedspikes or steps in the raw interferograms. The sample images weretrimmed to remove all pixels representing just the KBr window and mixedpixels at the extreme boundaries of the sample that contained sample andKBr window. Only pixels present in all single-beam images were retainedfor final analysis. The sample spectra were ratioed to backgrounds andconverted to absorbance. Large, relatively complex baseline variationswere present in these spectra. The presence of these baseline variationscomplicated attempts to extract pure-component spectra. Therefore, thespectra were preprocessed by isolating the spectral features of thesample and performing a separate linear baseline correction under eachspectral band for each spectrum in the image.

[0180] When pathlength varies over the imaged area of the sample,concentration information can be confounded with pathlength changes,since pathlength and concentration changes are generallyindistinguishable. These pathlength variations can be largely correctedif the Beers-Lambert law is closely followed and if a spectral regioncan be identified that represents only pathlength variations. Since theneoprene rubber sample thickness was not uniform over the sample area,pathlength corrections to the data were required to obtain meaningfulconcentration maps of the various components across the sample. Toperform pathlength corrections without direct measurements of the samplethickness, a spectral band that was most representative of samplepathlength was identified. This band was selected among the sixbaseline-corrected bands by performing a principal components analysis(PCA) on each spectral band and examining the results carefully.Necessary conditions for a spectral band to be representative of samplepathlength are that the band be present in all pixels of the image, thatit has a consistent spectral shape in all pixels, and that onlymultiplicative changes are present in the band. Spectral bands thatfulfill these conditions of only a multiplicative change will have thefirst PCA eigenvector represent nearly all the variance, and the firsteigenvector will have the same shape as the average spectrum of theselected band. Of the six bands identified in the neoprene sample, the1440 cm⁻¹ band most nearly followed these criteria and was, therefore,selected as representing sample pathlength.

[0181] Once an appropriate spectral band has been selected forpathlength correction, pathlength correction terms can be determined foreach spectrum in the hyperspectral image. Relative pathlength for eachpixel was determined by assigning a relative pathlength of one to theaverage spectrum of the 1440 cm⁻¹ band and using the average spectrum ina CLS prediction of all sample image spectra. The CLS analysis alsoincluded a simultaneous fit of linear baseline components to the band.Relative pathlengths varied from 0.13 to 2.5. The thinnest pathlengthswere at the edges of the sample. Because the spectra contained artifactsfrom the imaging system, the spectra were not scaled for pathlengthsince this process could serve to increase the variance of the artifactin the image. Rather, the MCR was performed on the uncorrected spectra,and the estimated concentrations were corrected for relative pathlengthvariations.

[0182] The primary goal of the analysis was to obtain the pure-componentspectra and concentration maps of each component in the image directlyfrom the spectral data, since standards were not available for thesesamples. As described above, quantitative analysis without standards canbe accomplished with the MCR algorithm using constrained alternatingleast squares in an iterative process. Constraints were applied to limitthe range of possible solutions in the analysis and minimize thedetrimental effects of the system artifacts on the analyses. Theseconstraints comprised both non-negativity of the concentrations and thepure-component spectra, since real concentrations and spectra willconsist of all positive values, and equality constraints.

[0183] The MCR analysis requires a starting point for the pure spectraand/or their corresponding concentrations. In the analysis describedherein, pure component spectra obtained from the SIMPLISMA algorithmwere used as starting points. These pure-component spectra are generatedby finding and extracting the purest spectral bands in the spectral datasets one at a time. From these purest pixels, often reasonable initialestimates of the pure-component spectra can be obtained. Random numberscan also be used as starting points.

[0184] Initially, MCR was performed starting with the SIMPLISMAestimated pure-component spectra using only non-negativity constraints.The constrained alternating least squares algorithm used was amodification of the fast non-negative least squares algorithm from thepreviously referenced Bro et al. Changes to Bro's algorithm were made tospeed the convergence of the MCR analysis. The spectral data wereheavily contaminated by systematic errors, probably due to readoutproblems from the FPA detector, that severely degraded the quality ofthe both the pure-component spectra and the concentration maps generatedfrom the MCR analysis. The result is the presence of large correlatedresiduals in the spectral domain and a banding artifact in theconcentration maps.

[0185] The standard method of applying MCR with non-negativityconstraints to spectral data can be improved by augmentation of thecalibration and prediction solutions with estimates of the errorcovariance structure in the spectra. All of the current implementationsof the MCR algorithms assume that the spectral errors are random, ofconstant variance and independent (i.e., uncorrelated). Theseassumptions were violated in the FT-IR image data. The ACLS methods ofthe present invention enable the MCR analysis to be corrected for thepresence of non-constant and correlated spectral errors in the data. Tocorrect the MCR analysis using the ACLS methods, an estimate of theerror covariance structure of the spectral image data is required. Thisestimate can be obtained from the repeat image spectral differences ofthe sample. If the sample does not move and does not change over time,then any changes in the image spectra at a given pixel are related tosystematic changes in the system response as well as random noise.

[0186] In the past, E_(A) in Eq. (1) has generally been assumed toconsist of identically distributed uncorrelated errors of zero mean.However, instrument drift, system artifacts, and a variety of possibleerrors that occur in collecting the interferogram can generatecorrelated error structure. The conversion to absorbance also has theeffect of generating errors with non-uniform error variance. The errorcovariance structure of E_(A) has often been estimated in remote sensingapplications with hyperspectral images with a shift difference generatedfrom a single hyperspectral image. This shift difference assumes thatthe signal between pixels is slowly varying and, therefore, the spectraldifference between adjacent pixels is representative of noise. The shiftdifference must be used if direct repeat image spectra cannot beobtained, for example in the case of a moving detector in remote sensingapplications. However, in the laboratory, repeat image spectra can beobtained, and the difference between these images can be used to obtainan estimate of the error covariance structure of E_(A). Since errors inthe image data described herein are correlated, a PCA analysis of E_(A)can serve to separate the correlated errors from the random error.

[0187] In Eq. (6), TP is the product of scores and eigenvectors from thePCA of E_(A) for the r scores and eigenvectors that represent thesignificant correlated errors contained in E_(A). The number ofeigenvectors was selected based on a visual examination of thecorresponding score images. Score images with nonrandom patterns wereretained. E then contains primarily the random errors that are expectedin any least squares analysis. To estimate C and K without thedetrimental influence of T and P that can contaminate the concentrationand pure-component spectral estimates, C can be augmented with T duringcalibration, or K can be augmented with P during prediction, or both. Byaugmenting columns of C with selected columns from T, the least squaresestimate of K is forced to ignore any pattern in the image that has thesame spatial distributions as any given set of column scores in the Tvectors that are used to augment C. By augmenting the rows of K withselected rows from P, the least squares estimate of C is required toignore any spectral shape contained in any given row eigenvector of Pvectors that are used to augment K. The corresponding ACLS calibrationand prediction equations that are part of the alternating least squaresMCR procedure are then given by equations (22) and (23), respectively.In practice, the MCR method implements the augmentation by usingequality constraints to force the estimates of K and C to ignore sourcesof spectral and spatial variation that are contained in the augmented$\hat{\overset{\sim}{C}}$

[0188] and $\hat{\overset{\sim}{K}}$

[0189] matrices.

[0190] Results from the Experimental Data

[0191] A visible video image of the approximate portion of the agedneoprene sample imaged by the FT-IR spectrometer is displayed in FIG.12. A slight darkening from the center to the edge of the sample is theonly indication in the visible image of the degradation of the sampledue to the aging. The 1269 spectra from image pixels representing thesample are presented in FIG. 13A. Note that there are significantcomplex baseline variations present in these data. The spectra that weresubjected to linear baseline correction for each of the six significantspectral bands of neoprene are presented in FIG. 13B. The preprocessedspectra in FIG. 13B were subjected to the MCR analysis.

[0192] As mentioned in the Experimental section, systematic artifactswere present in the FT-IR spectral image data. These artifacts were notalways readily observable in the raw interferograms, single-beamspectra, or absorbance spectra. However, a PCA analysis of the spectralimage data revealed the presence of the artifact as a systematic stripedor banded pattern in the score images from the PCA analysis. Asindicated in FIG. 14, this striped pattern was apparent in the PCAscores of interferograms, single-beams, and absorbance spectra obtainedfrom the spectra of different sample or background images. The relativemagnitude of the artifact varied considerably between various image datasets as evidenced by how many eigenvectors were generated before theartifact became noticeable. The data in FIG. 14 illustrates collectedimage data where the presence of the artifact varied from very small tosignificant as indicated by the fact that the striping artifact firstbecomes visible in factor 2, 5, or 10, depending on the image. Todemonstrate the value of the ACLS methods applied to the MCR analysis, aspectral image where the artifact was a significant portion of the datawas chosen.

[0193] Not only is the artifact clearly present in the PCA score images,it is also present in the spectral data. In FIG. 15 are shown themean-centered difference spectra between repeat images of the agedneoprene sample, in which the systematic error (noise) can be observed.Clearly, the noise in the difference spectra in FIG. 15 is not uniform.In fact, as expected by theory, the noise is higher in those regionswhere the absorbances are larger. Thus, the spectral noise is higherwhere the neoprene sample absorbs the IR beam. What is not clear fromFIG. 15 is that the noise in the sample is also correlated betweenspectral frequencies. Thus, if the error is high at frequency p, ittends to be high at surrounding frequencies. The fact that the noise ishigh in broad spectral regions is not necessarily an indication ofcorrelated error; it could simply be an indication that the noise ishigher in broad spectral regions but uncorrelated between frequencies.

[0194] The non-uniform and correlated nature of the noise in thespectral images can best be observed by forming the error covariancematrix from the mean-centered repeat image spectral differences. Thistwo-dimensional representation of the spectral error in the image isshown in FIG. 16A. FIG. 16B shows the ideal noise behavior of uniformvariance and uncorrelated errors (i.e., only diagonal elements ofconstant magnitude). Clearly, the true error covariance structure of thespectral image data has non-uniform diagonal variance components andlarge off-diagonal terms, demonstrating that errors are highlycorrelated between frequencies. Thus, the errors observed from themeasured spectral image data are a far departure from the ideal behaviorpresented in FIG. 16B.

[0195] The non-constant noise variance for infrared absorbance spectrais expected, since the log transformation required to obtain absorbancespectra takes noise variance that is expected to be constant at allfrequencies for MCT detectors and makes the noise nonuniform over thespectral frequencies. The expected form of the noise variance of thedifference of two repeat absorbance spectra at a single frequency can beestimated as outlined below. $\begin{matrix}{A = {{\log \frac{\left( {I_{s} + \sigma_{s1}} \right)}{\left( {I_{s} + \sigma_{s2}} \right)}} = {{\log \left( {I_{s} + \sigma_{s1}} \right)} - {\log \left( {I_{s} + \sigma_{s2}} \right)}}}} & (25)\end{matrix}$

[0196] where I_(s) is the noise-free single-beam intensity of the tworepeat sample spectra and σ_(s1) and σ_(s2) are the standard deviationsof the noise from repeat spectra 1 and 2, respectively. From Eq. (25),estimates of the error variances in absorbance (σ_(A) ²) can be obtainedby taking a Taylor series expansion of the error variance in thesingle-beam intensities in Eq. (25) and retaining only the firstnon-zero term for both terms on the right-hand side of Eq. (25). Theresult is: $\begin{matrix}{{\sigma_{A}^{2} \approx {{\left( \frac{\delta \quad A}{\delta \quad I_{s}} \right)\sigma_{s1}^{2}} + {\left( \frac{\delta \quad A}{\delta \quad I_{s}} \right)\sigma_{s2}^{2}}}} = {{\frac{\sigma_{s1}^{2}}{I_{s}^{2}} + \frac{\sigma_{s2}^{2}}{I_{s}^{2}}} \approx \frac{2\sigma_{s}^{2}}{I_{s}^{2}}}} & (26)\end{matrix}$

[0197] where the last term on the right-hand side of Eq. (26) assumesthat the error variance is comparable for both repeat single-beamspectra. From Eq. (26), the error variance for the absorbance spectra isinversely proportional to the square of the single-beam intensity forthe infrared spectrum. In a simple weighted least squares fit of thedata, the weights would be proportional to the square root of theinverse of the spectral absorbance variance. If experimental estimatesfollow the expected theory, then the approximate proper weights would beproportional to the average single beam spectrum obtained from the imagespectra.

[0198]FIG. 17 compares the estimated weighting function for absorbanceerror variance obtained from the repeat sample image differences and thetheoretical weighting function based on the average single-beamintensity obtained from the sample image data. Clearly, the experimentaland theoretical estimates for the weighting function are very similar,demonstrating that the repeat spectral differences can lead toreasonable estimates for the error variance of the absorbance spectraldata. Confirmation that the full covariance structure of the residual iscaptured by the repeat image spectral differences enables the use ofthis information to reduce or eliminate systematic errors during theimage analysis.

[0199] Conventional MCR methods were applied to the spectral image datapresented in FIG. 13B using non-negativity constraints on bothconcentrations and pure-component spectra. Three pure componentsappeared to be adequate to represent the original thermally modifiedneoprene and the decomposition products. Using more than three purecomponents yielded results that were not as physically meaningful asobtained with three components only. The normalized MCR pure-componentspectra obtained with three components are presented in FIG. 18. Thetrace corresponding to component 1 is characteristic of the thermallyaged but unoxidized neoprene sample. The component 2 trace indicates adecomposition product that contains a significant carbonyl decompositionproduct. The component 3 trace indicates that a second decompositionproduct is present in the aged sample that is characterized by ahydroxyl dominated decomposition product with very little formation of acarbonyl species. This second decomposition product has a differentspatial profile in the sample than the first. The high-frequency regionof the pure-component spectra is quite noisy.

[0200] Pathlength-corrected relative concentration maps representing thethree pure components are presented in FIG. 19. Some of the stripedartifact presented in FIG. 14 is clearly visible in at least two ofthese concentration maps, indicating that the MCR analyses are corruptedby this artifact. The carbonyl decomposition product (component 2 inFIG. 19) demonstrates a large concentration gradient that is high at theouter edges of the sample. The hydroxyl component decomposition productshown as component 3 in FIG. 19 is more variable and extends somewhatbeyond the region containing the carbonyl decomposition product. Thethermally aged but unoxidized neoprene shown as component 1 in FIG. 19is present somewhat uniformly throughout the sample although it may havebeen reduced near the edge of the sample by the oxidation.

[0201] MCR was then applied to the same spectral image data with theconcentration matrix augmented by the first five sets of scores from themean-centered repeat image spectral differences, according to the ACLSmethod of the present invention. The score-augmented MCR was implementedwith equality constraints on these five sets of image scores. Inapplying these equality constraints on the scores, the alternatingclassical least-squares generation of the pure-component spectra iscorrected for the presence of score-related concentration errors in theimage data. FIG. 20 presents the three pure-component spectra generatedfrom the MCR augmented with the scores. Note that the noise in thehigh-energy region of the pure-component spectra is greatly reduced inthese pure-component estimates as compared to FIG. 18. In addition, moredetailed structure becomes available in the OH stretching region of thespectra (3000−3700 cm⁻¹ spectral region) with score augmentation. Thederivative nature of the hydroxyl band for the carbonyl dominateddecomposition product (component 2) may indicate some hydrogen bondingwith the hydroxyl dominated decomposition product. Since theconcentrations were not corrected when augmenting with scores only, theconcentration maps indicate essentially no improvement in the stripingartifact relative to the standard non-augmented MCR analysis of the samedata.

[0202] MCR was then applied to the same spectral image data with thefirst five sets of eigenvectors from the mean-centered repeat imagespectral differences added to the pure-component spectral matrix. Theeigenvector-augmented MCR was implemented with equality constraints onthese five eigenvectors. With eigenvector augmentation only, there wassome improvement in noise of the high-energy side of thehydroxyl-dominated pure component, but there was little, if any,improvement in the noise on the other pure-component spectra. However,the concentration maps exhibit dramatic improvement in the quality. Thestriping artifact was nearly completely removed when the eigenvectorsaugment the MCR, as shown in FIG. 21. These improvements inconcentration maps were expected since the augmentation of thepure-component spectra with the eigenvectors corrects the concentrationestimates.

[0203] Finally, the MCR analysis was augmented with both scores andeigenvectors from the repeat image spectral differences. In thisembodiment of the present invention, one simply augments with scoresduring the calibration phase of the MCR and augments with theeigenvectors during the prediction phase. Thus, at each iteration in theMCR procedure, any estimates related to the augmented vectors arereplaced with the original eigenvectors from the PCA decomposition ofthe repeat image differences. The combined augmentation with scores andeigenvectors generates both improved pure-component spectra andcorrected concentration maps. The pure-component spectra with scores andeigenvector augmentation are presented in FIG. 22. These spectra havethe low noise of the scores-augmented MCR and are also slightlymodified, especially in the high-energy hydroxyl spectral region. Theconcentration maps with scores- and eigenvector-augmented MCR arepresented in FIG. 23. Like the eigenvector-augmented MCR concentrationmaps, these images exhibit greatly reduced striping artifact. Thereduction in the artifact is emphasized in FIG. 24, which shows thedifference between the standard MCR and the fully augmented MCRconcentration maps for all three components (i.e., the differencebetween the concentration maps in FIG. 23 and FIG. 19). As indicated bythis difference map, a significant reduction in the system artifact wasachieved when augmenting with estimates of the error covarianceinformation derived from the difference between two repeat images.

[0204] Having thus described the present invention with particularityand demonstrated its utility with respect to both simulated and actualexperimental data, those skilled in the art will appreciate that thepresent invention has broad applicability to a wide range ofopportunities for analyzing multivariate spectral data and obtainingboth qualitative and quantitative results. Many preprocessing proceduressuch as centering, scaling, path-length correction, smoothing,derivatives, multiplicative signal correction, PCA, wavelet compression,and covariance filtering have been developed that can be performed onthe measured spectra when using the present invention.

We claim:
 1. A method for analyzing multivariate spectral data,comprising the steps of: a) creating a calibration model for acalibration set of multivariate spectral data A by: i) obtaining a setof reference component values C representative of at least one of thespectrally active components in the calibration set of multivariatespectral data A, ii) estimating pure-component spectra {circumflex over(K)} for the at least one of the spectrally active components accordingto {circumflex over (K)}=(C^(T)C)⁻¹C^(T)A=C⁺A, iii) obtaining spectralresiduals E_(A) according to E_(A)=A−C{circumflex over (K)}, and iv)augmenting the estimated pure-component spectra {circumflex over (K)}with at least one vector of the spectral residuals E_(A) to obtainaugmented pure-component spectra $\hat{\overset{\sim}{K}};$

 and b) predicting a set of component values $\hat{\overset{\sim}{C}}$

 for a prediction set of multivariate spectral data A_(P) by: i) furtheraugmenting the augmented pure-component spectra$\hat{\overset{\sim}{K}}$

 with at least one vector representing a spectral shape that isrepresentative of at least one additional source of spectral variationin the prediction set, and ii) predicting the set of component values$\hat{\overset{\sim}{C}}$

 using the further augmented pure-component spectra$\hat{\overset{\sim}{K}}$

 according to$\hat{\overset{\sim}{C}} = {{A_{P}{{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}{\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {{A_{P}\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}.}}$


2. The method of claim 1, wherein augmenting step a)iv) is repeateduntil all sources of spectral variation are accounted for in thecalibration set.
 3. The method of claim 1, wherein an F-test is used toselect the number of vectors used to augment the estimatedpure-component spectra {circumflex over (K)}.
 4. The method of claim 1,further comprising the step of augmenting the estimated pure-componentspectra {circumflex over (K)} with at least one vector representing atleast one spectral shape representative of at least one additionalsources of spectral variation in the calibration set prior to stepa)iii).
 5. The method of claim 1, further comprising the step ofaugmenting {circumflex over (K)} to account for baseline variations inthe calibration set prior to step a)iii).
 6. The method of claim 1,further comprising the step of augmenting $\hat{\overset{\sim}{K}}$

to account for baseline variations in the calibration set prior to stepb).
 7. The method of claim 1, wherein the reference component values Ccomprise concentrations of the spectrally active components.
 8. Themethod of claim 1, wherein the spectrally active components are selectedfrom the group consisting of chemical specie, chemical interactionsamong species, physical variations, temperature variations, humidityvariations, spectrometer drift, changes in spectrometers, and insertioneffects.
 9. The method of claim 1, wherein the spectral residuals E_(A)are obtained from a full CLS model or a cross-validated CLS model. 10.The method of claim 1, wherein the spectral residuals E_(A) aredecomposed according to E_(A)=TP+E, where T is a set of n×r scores and Pis a set of r×p loading vectors obtained from factor analysis of thespectral residuals E_(A), and E is a set of n×p random errors andspectral variations not useful for prediction.
 11. The method of claim10, wherein the P loading vectors comprise the at least one vector ofthe spectral residuals E_(A) used to augment the estimatedpure-component spectra {circumflex over (K)} in step a)iv).
 12. Themethod of claim 10, wherein the factor analysis method comprisesprincipal components analysis.
 13. A method for analyzing multivariatespectral data, comprising the steps of: a) creating a calibration modelfor a calibration set of multivariate spectral data A by: i) obtaining aset of reference component values C representative of at least one ofthe spectrally active components in the calibration set of multivariatespectral data A, ii) estimating pure-component spectra {circumflex over(K)} for the at least one of the spectrally active components accordingto {circumflex over (K)}=(C^(T)C)⁻¹C^(T)A=C⁺A, iii) obtaining spectralresiduals E_(A) according to E_(A)=A−C{circumflex over (K)}, iv)decomposing the spectral residuals E_(A) according to E_(A)=TP+E where Tis a set of n×r scores and P is a set of r×p loading vectors obtainedfrom factor analysis of the spectral residuals E_(A), and E is a set ofn×p random errors and spectral variations not useful for prediction, v)augmenting the set of reference component values C with at least onevector of the T scores to obtain a set of augmented component values{tilde over (C)}, and vi) estimating augmented pure-component spectra$\hat{\overset{\sim}{K}}$

 according to${\hat{\overset{\sim}{K}} = {{\left( {{\overset{\sim}{C}}^{T}\overset{\sim}{C}} \right)^{- 1}{\overset{\sim}{C}}^{T}A} \approx {{\overset{\sim}{C}}^{+}A}}};$

 and b) predicting a set of component values $\hat{\overset{\sim}{C}}$

 for a prediction set of multivariate spectral data A_(P) according to$\hat{\overset{\sim}{C}} = {{A_{P}{{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}{\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {{A_{P}\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}.}}$


14. The method of claim 13, wherein the augmenting step a)iv) isrepeated until all sources of spectral variation are accounted for inthe calibration set.
 15. The method of claim 13, wherein an F-test isused to select the number of vectors used to augment the estimatedpure-component spectra {circumflex over (K)}.
 16. The method of claim13, further comprising the step of augmenting the estimatedpure-component spectra {circumflex over (K)} with at least one vectorrepresenting a spectral shape representative of at least one additionalsource of spectral variation in the calibration set prior to stepa)iii).
 17. The method of claim 13, further comprising the step ofaugmenting {circumflex over (K)} to account for baseline variations inthe calibration set prior to step a)iii).
 18. The method of claim 13,further comprising the step of augmenting the augmented pure-componentspectra {tilde over ({circumflex over (K)})} with at least one vectorrepresenting a spectral shape representative of at least one additionalsource of spectral variation in the prediction set prior to step b). 19.The method of claim 13, further comprising the step of augmenting$\hat{\overset{\sim}{K}}$

to account for baseline variations in the calibration set prior to stepb).
 20. The method of claim 13, wherein the reference component values Ccomprise concentrations of the spectrally active components.
 21. Themethod of claim 13, wherein the spectrally active components areselected from the group consisting of chemical specie, chemicalinteractions among species, physical variations, temperature variations,humidity variations, spectrometer drift, changes in spectrometers, andinsertion effects.
 22. The method of claim 13, wherein the step a)iv)comprises augmenting the set of reference component values C with a setof random numbers.
 23. The method of claim 13, wherein the spectralresiduals E_(A) are obtained from a full CLS model or a cross-validatedCLS model.
 24. The method of claim 13, wherein the factor analysismethod comprises principal components analysis.
 25. A method foranalyzing multivariate spectral data, comprising the steps of: a)creating a calibration model for a calibration set of multivariatespectral data A by: i) obtaining a set of reference component values Crepresentative of at least one of the spectrally active components inthe calibration set of multivariate spectral data A, ii) estimatingpure-component spectra {circumflex over (K)} for the at least one of thespectrally active components according to {circumflex over(K)}=(C^(T)C)⁻¹C^(T)A=C⁺A, iii) estimating a set of component values Ĉusing the estimated pure-component spectra {circumflex over (K)}according to Ĉ=A{circumflex over (K)}^(T)({circumflex over(K)}{circumflex over (K)}^(T))⁻¹=A({circumflex over (K)}^(T))⁺, iv)obtaining component residuals E_(c) according to E_(c)=Ĉ−C; v)augmenting the set of reference component values C with a vector of thecomponent residuals E_(c) to obtain a set of augmented component values{tilde over (C)}, and vi) obtaining augmented pure-component spectra$\hat{\overset{\sim}{K}}$

 from the set of augmented component values {tilde over (C)} accordingto${\hat{\overset{\sim}{K}} = {{\left( {{\overset{\sim}{C}}^{T}\overset{\sim}{C}} \right)^{- 1}{\overset{\sim}{C}}^{T}A} = {{\overset{\sim}{C}}^{+}A}}};$

 and b) predicting a set of component values $\hat{\overset{\sim}{C}}$

 for a prediction set of multivariate spectral data A_(P) according to$\hat{\overset{\sim}{C}} = {{A_{p}{{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}\quad {\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {{A_{p}\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}.}}$


26. The method of claim 25, further comprising repeating the followingsteps at least once prior to step b): a) estimating augmented componentvalues $\hat{\overset{\sim}{C}}$

 according to$\hat{\overset{\sim}{C}} = {{A\quad {{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}\quad {\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {A\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}}$

 using the augmented pure-component spectra $\hat{\overset{\sim}{K}}$

 from step a)vi), b) calculating updated estimated component values Ĉ′consisting of the estimated component values in$\hat{\overset{\sim}{C}}$

 corresponding to the known reference component values in C, c)obtaining new component residuals E_(c)′ according to E_(c)′=Ĉ−C, and d)further augmenting the set of augmented component values {tilde over(C)} with a vector of the component residuals E_(c)′ to obtain a new setof augmented component values {tilde over (C)} to be used to obtain newaugmented pure-component spectra $\hat{\overset{\sim}{K}}$

 in step a)vi).
 27. The method of claim 26, wherein an F-test is used toselect the number of times that steps a) through d) are repeated. 28.The method of claim 25, further comprising the step of augmenting{circumflex over (K)} with at least one vector representing a spectralshape that is representative of at least one additional source ofspectral variation in the calibration set prior to step a)iii).
 29. Themethod of claim 25, further comprising the step of augmenting{circumflex over (K)} to account for baseline variations in thecalibration set prior to step a)iii).
 30. The method of claim 25,further comprising the step of augmenting $\hat{\overset{\sim}{K}}$

with at least one vector representing a spectral shape that isrepresentative of at least one additional source of spectral variationin the prediction set prior to step b).
 31. The method of claim 25,further comprising the step of augmenting $\hat{\overset{\sim}{K}}$

account for baseline variations in the calibration set prior to step b).32. The method of claim 25, wherein the reference component values Ccomprise concentrations of the spectrally active components.
 33. Themethod of claim 25, wherein the spectrally active components areselected from the group consisting of chemical specie, chemicalinteractions among species, physical variations, temperature variations,humidity variations, spectrometer drift, changes in spectrometers, andinsertion effects.
 34. The method of claim 25, wherein the vectorcomprises a set of random numbers.
 35. A method of multivariate spectralanalysis, comprising the steps of: a) obtaining at least two referencevariables for a first sample set comprised of at least one spectrallyactive component, b) estimating at least one of the reference variablesfor the first sample set, c) obtaining a residual between the at leastone of the reference variables and its corresponding at least oneestimated reference variable, d) augmenting the at least one of thereference variables with a measure of the corresponding residual, and e)predicting a value of at least one variable in a second sample set usingthe augmented reference variable obtained from the first sample.
 36. Themethod of claim 35, wherein the reference variables comprise spectra forthe first sample set and component values of at least one component inthe first sample set.
 37. The method of claim 35, wherein theaugmentation step d) is repeated until all sources of spectral variationare accounted for in the first sample set.
 38. The method of claim 35,further comprising augmenting the at least one of the referencevariables with at least one random number.
 39. A method of multivariatespectral analysis, comprising the steps of: a) obtaining an estimate ofspectral error covariance E_(A) for measured set of multivariatespectral data A; b) decomposing the spectral error covariance E_(A)according to E_(A)=TP+E, where T is a set of n×r scores and P is a setof r×p loading vectors obtained from factor analysis of the spectralerror covariance E_(A), and E is a set of n×p random errors and spectralvariations not useful for prediction; c) guessing pure-component spectraK for the set of multivariate spectral data A; d) predicting a set ofcomponent values Ĉ according to Ĉ=AK^(T)(KK^(T))⁻¹=A(K^(T))⁺; e)augmenting the set of predicted component values Ĉ with at least onevector of the T scores to obtain a first set of augmented componentvalues $\hat{\overset{\sim}{C}};$

f) estimating augmented pure-component spectra $\hat{\overset{\sim}{K}}$

 according to${\hat{\overset{\sim}{K}} = {{\left( {{\hat{\overset{\sim}{C}}}^{T}\hat{\overset{\sim}{C}}} \right)^{- 1}{\hat{\overset{\sim}{C}}}^{T}A} = {\hat{\overset{\sim}{C^{+}}}A}}};$

g) testing for convergence according to${{A - {\hat{\overset{\sim}{C}}\hat{\overset{\sim}{K}}}}}^{2};$

h) predicting a second set of augmented component values$\hat{\overset{\sim}{C}}$

 according to${\hat{\overset{\sim}{C}} = {{A{{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}{\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {A\left( {\hat{\overset{\sim}{K}}}^{T} \right)}}};$

i) replacing the augmented portion of the second set of augmentedcomponent values $\hat{\overset{\sim}{C}}$

 with the at least one vector of the T scores to obtain a third set ofaugmented component values $\hat{\overset{\sim}{C}};$

 and j) repeating steps f) through i) at least once.
 40. The method ofclaim 39, wherein the steps f) through i) are repeated until the test ofstep g) converges to obtain an alternating classical least squaressolution for $\hat{\overset{\sim}{K}}$

and $\hat{\overset{\sim}{C}}.$


41. The method of claim 39, further comprising replacing the augmentedportion of the augmented pure-component spectra$\hat{\overset{\sim}{K}}$

with at least one vector of the P loading vectors prior to step h). 42.The method of claim 39, further comprising augmenting$\hat{\overset{\sim}{K}}$

with at least one vector representing a spectral shape that isrepresentative of at least one additional source of spectral variationprior to step h).
 43. The method of claim 39, further comprisingapplying at least one constraint to the non-augmented portion of$\hat{\overset{\sim}{K}}$

at step f).
 44. The method of claim 43, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.45. The method of claim 39, further comprising applying at least oneconstraint to the non-augmented portion of $\hat{\overset{\sim}{C}}$

at step h).
 46. The method of claim 45, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.47. The method of claim 39, wherein the guessed pure-component spectra Kcomprises random numbers.
 48. The method of claim 39, wherein themeasured set of multivariate spectral data A comprises image data. 49.The method of claim 48, wherein the spectral error covariance E_(A) isobtained from a shift difference generated from a single image.
 50. Themethod of claim 48, wherein the spectral error covariance E_(A) isobtained from repeat image spectra.
 51. A method of multivariatespectral analysis, comprising the steps of: a) obtaining an estimate ofspectral error covariance E_(A) for measured set of multivariatespectral data A; b) decomposing the spectral error covariance E_(A)according to E_(A)=TP+E, where T is a set of n×r scores and P is a setof r×p loading vectors obtained from factor analysis of the spectralerror covariance E_(A), and E is a set of n×p random errors and spectralvariations not useful for prediction; c) guessing pure-component spectraK for the set of multivariate spectral data A; d) augmenting thepure-component spectra K with at least one vector of the P loadingvectors to obtain first augmented pure-component spectra {tilde over(K)}; e) predicting a first set of augmented component values$\hat{\overset{\sim}{C}}$

 according to${\hat{\overset{\sim}{C}} = {A\quad {{\overset{\sim}{K}}^{T}\left( {\overset{\sim}{K}\quad {\overset{\sim}{K}}^{T}} \right)}^{- 1}{A\left( {\overset{\sim}{K}}^{T} \right)}^{+}}};$

f) estimating second augmented pure-component spectra$\hat{\overset{\sim}{K}}$

 according to${\hat{\overset{\sim}{K}} = {{\left( {{\hat{\overset{\sim}{C}}}^{T}\hat{\overset{\sim}{C}}} \right)^{- 1}{\hat{\overset{\sim}{C}}}^{T}A} = {{\hat{\overset{\sim}{C}}}^{+}A}}};$

g) testing for convergence according to${{A - {\hat{\overset{\sim}{C}}\hat{\overset{\sim}{K}}}}}^{2};$

h) replacing the augmented portion of the second augmentedpure-component spectra $\hat{\overset{\sim}{K}}$

 with the at least one vector of the P loading vectors to obtain thirdaugmented pure-component spectra $\hat{\overset{\sim}{K}};$

 and i) predicting a second set of augmented component values$\hat{\overset{\sim}{C}}$

 according to${\hat{\overset{\sim}{C}} = {{A\quad {{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}\quad {\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {A\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}}};$

j) repeating steps f) through i) at least once.
 52. The method of claim51, wherein the steps f) through i) are repeated until the test of stepg) converges to obtain an alternating classical least squares solutionfor $\hat{\overset{\sim}{K}}$

and $\hat{\overset{\sim}{C}}.$


53. The method of claim 51, further comprising replacing the augmentedportion of the set of augmented component values$\hat{\overset{\sim}{C}}$

with at least one vector of the T scores prior to step h).
 54. Themethod of claim 51, further comprising augmenting$\hat{\overset{\sim}{K}}$

with at least one vector representing a spectral shape that isrepresentative of at least one additional source of spectral variationprior to step h).
 55. The method of claim 51, further comprisingapplying at least one constraint to the non-augmented portion of$\hat{\overset{\sim}{K}}$

at step f).
 56. The method of claim 55, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.57. The method of claim 51, further comprising applying at least oneconstraint to the non-augmented portion of $\hat{\overset{\sim}{C}}$

at step h).
 58. The method of claim 57, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.59. The method of claim 51, wherein the guessed pure-component spectra Kcomprises random numbers.
 60. The method of claim 51, wherein themeasured set of multivariate spectral data A comprises image data. 61.The method of claim 60, wherein the estimate of the error covarianceE_(A) is obtained from a shift difference generated from a single image.62. The method of claim 60, wherein the estimate of the error covarianceE_(A) is obtained from repeat image spectra.
 63. A method ofmultivariate spectral analysis, comprising the steps of: a) obtaining anestimate of the spectral error covariance E_(A) for measured set ofmultivariate spectral data A; b) decomposing the spectral errorcovariance E_(A) according to E_(A)=TP+E, where T is a set of n×r scoresand P is a set of r×p loading vectors obtained from factor analysis ofthe spectral error covariance E_(A), and E is a set of n×p random errorsand spectral variations not useful for prediction; c) guessing a set ofcomponent values C for the set of multivariate spectral data A; d)estimating pure-component spectra {circumflex over (K)} according to{circumflex over (K)}=(C^(T)C)⁻¹C^(T)A=C⁺A; e) augmenting thepure-component spectra {circumflex over (K)} with at least one vector ofthe P loading vectors to obtain first augmented pure-component spectra$\hat{\overset{\sim}{K}};$

f) predicting a first set of augmented component values$\hat{\overset{\sim}{C}}$

 according to${\hat{\overset{\sim}{C}} = {{A{{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}{\hat{\overset{\sim}{K}}}^{T}} \right)}^{- 1}} = {A\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}}};$

g) testing for convergence according to${{A - {\hat{\overset{\sim}{C}}\hat{\overset{\sim}{K}}}}}^{2};$

h) estimating second augmented pure-component spectra$\hat{\overset{\sim}{K}}$

 according to${\hat{\overset{\sim}{K}} = {{\left( {{\hat{\overset{\sim}{C}}}^{T}\hat{\overset{\sim}{C}}} \right)^{- 1}{\overset{\sim}{C}}^{T}A} = {{\hat{\overset{\sim}{C}}}^{+}A}}};$

i) replacing the augmented portion of the second augmentedpure-component spectra $\hat{\overset{\sim}{K}}$

 with the at least one vector of the P loading vectors to obtain a thirdaugmented pure-component spectra $\hat{\overset{\sim}{K}}$

 and j) repeating steps f) through i) at least once.
 64. The method ofclaim 63, wherein the steps f) through i) are repeated until the test ofstep g) converges to obtain an alternating classical least squaressolution for $\hat{\overset{\sim}{K}}$

and $\hat{\overset{\sim}{C}.}$


65. The method of claim 63, further comprising replacing the augmentedportion of the set of augmented component values$\hat{\overset{\sim}{C}}$

with at least one vector of the T scores prior to step h).
 66. Themethod of claim 63, further comprising augmenting$\hat{\overset{\sim}{K}}$

with at least one vector representing a spectral shape that isrepresentative of at least one additional source of spectral variationprior to step h).
 67. The method of claim 63, further comprisingapplying at least one constraint to the non-augmented portion of$\hat{\overset{\sim}{K}}$

at step h).
 68. The method of claim 67, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.69. The method of claim 63, further comprising applying at least oneconstraint to the non-augmented portion of $\hat{\overset{\sim}{C}}$

at step f).
 70. The method of claim 69, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.71. The method of claim 63, wherein the guessed set of component valuesC comprises random numbers.
 72. The method of claim 52, wherein themeasured set of multivariate spectral data A comprises image data. 73.The method of claim 62, wherein the spectral error covariance E_(A) isobtained from a shift difference generated from a single image.
 74. Themethod of claim 62, wherein the spectral error covariance E_(A) isobtained from repeat image spectra.
 75. A method of multivariatespectral analysis, comprising the steps of: a) obtaining an estimate ofthe spectral error covariance E_(A) for measured set of multivariatespectral data A; b) decomposing the spectral error covariance E_(A)according to E_(A)=TP+E, where T is a set of n×r scores and P is a setof r×p loading vectors obtained from factor analysis of the spectralerror covariance E_(A), and E is a set of n×p random errors and spectralvariations not useful for prediction; c) guessing a set of componentvalues C for the set of multivariate spectral data A; d) augmenting theset of component values C with at least one vector of the T scores toobtain a first set of augmented component values {tilde over (C)}; e)estimating augmented pure-component spectra $\hat{\overset{\sim}{K}}$

 according to${\hat{\overset{\sim}{K}} = {{\left( {{\overset{\sim}{C}}^{T}\overset{\sim}{C}} \right)^{- 1}{\overset{\sim}{C}}^{T}A} = {{\overset{\sim}{C}}^{+}A}}};$

f) testing for convergence according to${{A - {\overset{\sim}{C}\quad \hat{\overset{\sim}{K}}}}}^{2};$

g) predicting a second set of augmented component values$\hat{\overset{\sim}{C}}$

 according to${\hat{\overset{\sim}{C}} = {{{A\quad {{\hat{\overset{\sim}{K}}}^{T}\left( {\hat{\overset{\sim}{K}}\quad {\hat{\overset{\sim}{K}}}^{T}} \right)}} - 1} = {A\left( {\hat{\overset{\sim}{K}}}^{T} \right)}^{+}}};$

h) replacing the augmented portion of the second set of augmentedcomponent values $\hat{\overset{\sim}{C}}$

 with the at least one vector of the T scores to obtain a third set ofaugmented component values $\hat{\overset{\sim}{C}}$

 and i) repeating steps e) through h) at least once, using the augmentedcomponent values $\hat{\overset{\sim}{C}}$

 in step f).
 76. The method of claim 75, wherein the steps e) through h)are repeated until the test of step g) converges to obtain analternating classical least squares solution for$\hat{\overset{\sim}{K}}$

and $\hat{\overset{\sim}{C}}$


77. The method of claim 75, further comprising replacing the augmentedportion of the augmented pure-component spectra$\hat{\overset{\sim}{K}}$

with at least one vector of the P loading vectors prior to step e). 78.The method of claim 75, further comprising augmenting$\hat{\overset{\sim}{K}}$

with at least one vector representing a spectral shape that isrepresentative of at least one additional source of spectral variationprior to step e).
 79. The method of claim 75, further comprisingapplying at least one constraint to the non-augmented portion of$\hat{\overset{\sim}{K}}$

at step e).
 80. The method of claim 79, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.81. The method of claim 75, further comprising applying at least oneconstraint to the non-augmented portion of $\hat{\overset{\sim}{C}}$

at step g).
 82. The method of claim 81, wherein the at least oneconstraint is selected from the group consisting of non-negativity,equality, closure, monotonic constraint, unimodality, and selectivity.83. The method of claim 75, wherein the guessed set of component valuesC comprises random numbers.
 84. The method of claim 75, wherein themeasured set of multivariate spectral data A comprises image data. 85.The method of claim 84, wherein the spectral error covariance E_(A) isobtained from a shift difference generated from a single image.
 86. Themethod of claim 84, wherein the spectral error covariance E_(A) isobtained from repeat image spectra.